How does Connected Components Labeling with Decision Trees perform on GPUs?
Abstract: In this paper the problem of Connected Components Labeling (CCL) in binary images using Graphic Processing Units (GPUs) is tackled by a different perspective. In the last decade, many novel algorithms have been released, specifically designed for GPUs. Because CCL literature concerning sequential algorithms is very rich, and includes many efficient solutions, designers of parallel algorithms were often inspired by techniques that had already proved successful in a sequential environment, such as the Union-Find paradigm for solving equivalences between provisional labels. However, the use of decision trees to minimize memory accesses, which is one of the main feature of the best performing sequential algorithms, was never taken into account when designing parallel CCL solutions. In fact, branches in the code tend to cause thread divergence, which usually leads to inefficiency. Anyway, this consideration does not necessarily apply to every possible scenario. Are we sure that the advantages of decision trees do not compensate for the cost of thread divergence? In order to answer this question, we chose three well-known sequential CCL algorithms, which employ decision trees as the cornerstone of their strategy, and we built a data-parallel version of each of them. Experimental tests on real case datasets show that, in most cases, these solutions outperform state-of-the-art algorithms, thus demonstrating the effectiveness of decision trees also in a parallel environment.
Citation:Allegretti, Stefano; Bolelli, Federico; Cancilla, Michele; Pollastri, Federico; Canalini, Laura; Grana, Costantino "How does Connected Components Labeling with Decision Trees perform on GPUs?" Computer Analysis of Images and Patterns, vol. 11678, Salerno, Italy, pp. 39 -51 , Sep 3-5, 2019 DOI: 10.1007/978-3-030-29888-3_4
- Author version:
- DOI: 10.1007/978-3-030-29888-3_4