forked from taskflow/taskflow
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathcudaFlowSort.html
More file actions
157 lines (147 loc) · 18.8 KB
/
cudaFlowSort.html
File metadata and controls
157 lines (147 loc) · 18.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>cudaFlow Algorithms » Parallel Sort | Taskflow QuickStart</title>
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:400,400i,600,600i%7CSource+Code+Pro:400,400i,600" />
<link rel="stylesheet" href="m-dark+documentation.compiled.css" />
<link rel="icon" href="favicon.ico" type="image/vnd.microsoft.icon" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="theme-color" content="#22272e" />
</head>
<body>
<header><nav id="navigation">
<div class="m-container">
<div class="m-row">
<span id="m-navbar-brand" class="m-col-t-8 m-col-m-none m-left-m">
<a href="https://taskflow.github.io"><img src="taskflow_logo.png" alt="" />Taskflow</a> <span class="m-breadcrumb">|</span> <a href="index.html" class="m-thin">QuickStart</a>
</span>
<div class="m-col-t-4 m-hide-m m-text-right m-nopadr">
<a href="#search" class="m-doc-search-icon" title="Search" onclick="return showSearch()"><svg style="height: 0.9rem;" viewBox="0 0 16 16">
<path id="m-doc-search-icon-path" d="m6 0c-3.31 0-6 2.69-6 6 0 3.31 2.69 6 6 6 1.49 0 2.85-0.541 3.89-1.44-0.0164 0.338 0.147 0.759 0.5 1.15l3.22 3.79c0.552 0.614 1.45 0.665 2 0.115 0.55-0.55 0.499-1.45-0.115-2l-3.79-3.22c-0.392-0.353-0.812-0.515-1.15-0.5 0.895-1.05 1.44-2.41 1.44-3.89 0-3.31-2.69-6-6-6zm0 1.56a4.44 4.44 0 0 1 4.44 4.44 4.44 4.44 0 0 1-4.44 4.44 4.44 4.44 0 0 1-4.44-4.44 4.44 4.44 0 0 1 4.44-4.44z"/>
</svg></a>
<a id="m-navbar-show" href="#navigation" title="Show navigation"></a>
<a id="m-navbar-hide" href="#" title="Hide navigation"></a>
</div>
<div id="m-navbar-collapse" class="m-col-t-12 m-show-m m-col-m-none m-right-m">
<div class="m-row">
<ol class="m-col-t-6 m-col-m-none">
<li><a href="pages.html">Handbook</a></li>
<li><a href="namespaces.html">Namespaces</a></li>
</ol>
<ol class="m-col-t-6 m-col-m-none" start="3">
<li><a href="annotated.html">Classes</a></li>
<li><a href="files.html">Files</a></li>
<li class="m-show-m"><a href="#search" class="m-doc-search-icon" title="Search" onclick="return showSearch()"><svg style="height: 0.9rem;" viewBox="0 0 16 16">
<use href="#m-doc-search-icon-path" />
</svg></a></li>
</ol>
</div>
</div>
</div>
</div>
</nav></header>
<main><article>
<div class="m-container m-container-inflatable">
<div class="m-row">
<div class="m-col-l-10 m-push-l-1">
<h1>
<span class="m-breadcrumb"><a href="cudaFlowAlgorithms.html">cudaFlow Algorithms</a> »</span>
Parallel Sort
</h1>
<div class="m-block m-default">
<h3>Contents</h3>
<ul>
<li><a href="#CUDAParallelSortIncludeTheHeader">Include the Header</a></li>
<li><a href="#cudaFlowSortARangeofItems">Sort a Range of Items</a></li>
<li><a href="#cudaFlowSortKeyValueItems">Sort a Range of Key-Value Items</a></li>
<li><a href="#cudaFlowSortMiscellaneousItems">Miscellaneous Items</a></li>
</ul>
</div>
<p>cudaFlow provides template methods to create parallel sort tasks on a CUDA GPU.</p><section id="CUDAParallelSortIncludeTheHeader"><h2><a href="#CUDAParallelSortIncludeTheHeader">Include the Header</a></h2><p>You need to include the header file, <code>taskflow/cuda/algorithm/sort.hpp</code>, for creating a parallel-sort task.</p></section><section id="cudaFlowSortARangeofItems"><h2><a href="#cudaFlowSortARangeofItems">Sort a Range of Items</a></h2><p><a href="classtf_1_1cudaFlow.html#ae462d455fed06dfcdbd1e25a2c9c5da6" class="m-doc">tf::<wbr />cudaFlow::<wbr />sort</a> performs an in-place parallel sort over a range of elements specified by <code>[first, last)</code> using the given comparator. The following code sorts one million random integers in an increasing order on a GPU.</p><pre class="m-code"><span class="k">const</span> <span class="kt">size_t</span> <span class="n">N</span> <span class="o">=</span> <span class="mi">1000000</span><span class="p">;</span>
<span class="kt">int</span><span class="o">*</span> <span class="n">vec</span> <span class="o">=</span> <span class="n">tf</span><span class="o">::</span><span class="n">cuda_malloc_shared</span><span class="o"><</span><span class="kt">int</span><span class="o">></span><span class="p">(</span><span class="n">N</span><span class="p">);</span> <span class="c1">// vector</span>
<span class="c1">// initializes the data</span>
<span class="k">for</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="n">i</span><span class="o"><</span><span class="n">N</span><span class="p">;</span> <span class="n">vec</span><span class="p">[</span><span class="n">i</span><span class="o">++</span><span class="p">]</span><span class="o">=</span><span class="n">rand</span><span class="p">());</span>
<span class="c1">// create a cudaFlow of one task to perform parallel sort</span>
<span class="n">tf</span><span class="o">::</span><span class="n">cudaFlow</span> <span class="n">cf</span><span class="p">;</span>
<span class="n">tf</span><span class="o">::</span><span class="n">cudaTask</span> <span class="n">task</span> <span class="o">=</span> <span class="n">cf</span><span class="p">.</span><span class="n">sort</span><span class="p">(</span>
<span class="n">vec</span><span class="p">,</span> <span class="n">vec</span> <span class="o">+</span> <span class="n">N</span><span class="p">,</span> <span class="p">[]</span><span class="n">__device__</span><span class="p">(</span><span class="kt">int</span> <span class="n">a</span><span class="p">,</span> <span class="kt">int</span> <span class="n">b</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">a</span> <span class="o"><</span> <span class="n">b</span><span class="p">;</span> <span class="p">}</span>
<span class="p">);</span>
<span class="n">cf</span><span class="p">.</span><span class="n">offload</span><span class="p">();</span>
<span class="n">assert</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">is_sorted</span><span class="p">(</span><span class="n">vec</span><span class="p">,</span> <span class="n">vec</span><span class="o">+</span><span class="n">N</span><span class="p">));</span></pre><p>You can specify a different comparator to <a href="classtf_1_1cudaFlow.html#ae462d455fed06dfcdbd1e25a2c9c5da6" class="m-doc">tf::<wbr />cudaFlow::<wbr />sort</a> to alter the sorting order. For example, the following code sorts one million random integers in an decreasing order on a GPU.</p><pre class="m-code"><span class="k">const</span> <span class="kt">size_t</span> <span class="n">N</span> <span class="o">=</span> <span class="mi">1000000</span><span class="p">;</span>
<span class="kt">int</span><span class="o">*</span> <span class="n">vec</span> <span class="o">=</span> <span class="n">tf</span><span class="o">::</span><span class="n">cuda_malloc_shared</span><span class="o"><</span><span class="kt">int</span><span class="o">></span><span class="p">(</span><span class="n">N</span><span class="p">);</span> <span class="c1">// vector</span>
<span class="c1">// initializes the data</span>
<span class="k">for</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="n">i</span><span class="o"><</span><span class="n">N</span><span class="p">;</span> <span class="n">vec</span><span class="p">[</span><span class="n">i</span><span class="o">++</span><span class="p">]</span><span class="o">=</span><span class="n">rand</span><span class="p">());</span>
<span class="c1">// create a cudaFlow of one task to perform parallel sort</span>
<span class="n">tf</span><span class="o">::</span><span class="n">cudaFlow</span> <span class="n">cf</span><span class="p">;</span>
<span class="n">tf</span><span class="o">::</span><span class="n">cudaTask</span> <span class="n">task</span> <span class="o">=</span> <span class="n">cf</span><span class="p">.</span><span class="n">sort</span><span class="p">(</span>
<span class="n">vec</span><span class="p">,</span> <span class="n">vec</span> <span class="o">+</span> <span class="n">N</span><span class="p">,</span> <span class="p">[]</span> <span class="n">__device__</span> <span class="p">(</span><span class="kt">int</span> <span class="n">a</span><span class="p">,</span> <span class="kt">int</span> <span class="n">b</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">a</span> <span class="o">></span> <span class="n">b</span><span class="p">;</span> <span class="p">}</span>
<span class="p">);</span>
<span class="n">cf</span><span class="p">.</span><span class="n">offload</span><span class="p">();</span>
<span class="n">assert</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">is_sorted</span><span class="p">(</span><span class="n">vec</span><span class="p">,</span> <span class="n">vec</span><span class="o">+</span><span class="n">N</span><span class="p">,</span> <span class="p">[](</span><span class="kt">int</span> <span class="n">a</span><span class="p">,</span> <span class="kt">int</span> <span class="n">b</span><span class="p">){</span> <span class="k">return</span> <span class="n">a</span> <span class="o">></span> <span class="n">b</span><span class="p">;</span> <span class="p">}));</span></pre></section><section id="cudaFlowSortKeyValueItems"><h2><a href="#cudaFlowSortKeyValueItems">Sort a Range of Key-Value Items</a></h2><p><a href="classtf_1_1cudaFlow.html#a979739fcf70fbd760ad1a7682a8dfea8" class="m-doc">tf::<wbr />cudaFlow::<wbr />sort_by_key</a> sorts a range of key-value items into ascending key order. If <code>i</code> and <code>j</code> are any two valid iterators in <code>[k_first, k_last)</code> such that <code>i</code> precedes <code>j</code>, and <code>p</code> and <code>q</code> are iterators in <code>[v_first, v_first + (k_last - k_first))</code> corresponding to <code>i</code> and <code>j</code> respectively, then <code>comp(*j, *i)</code> evaluates to <code>false</code>. The following example sorts a range of items into ascending key order and swaps their corresponding values:</p><pre class="m-code"><span class="k">const</span> <span class="kt">size_t</span> <span class="n">N</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span>
<span class="k">auto</span> <span class="n">vec</span> <span class="o">=</span> <span class="n">tf</span><span class="o">::</span><span class="n">cuda_malloc_shared</span><span class="o"><</span><span class="kt">int</span><span class="o">></span><span class="p">(</span><span class="n">N</span><span class="p">);</span> <span class="c1">// keys</span>
<span class="k">auto</span> <span class="n">idx</span> <span class="o">=</span> <span class="n">tf</span><span class="o">::</span><span class="n">cuda_malloc_shared</span><span class="o"><</span><span class="kt">int</span><span class="o">></span><span class="p">(</span><span class="n">N</span><span class="p">);</span> <span class="c1">// values</span>
<span class="c1">// initializes the data</span>
<span class="n">vec</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="n">vec</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span> <span class="n">vec</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="mi">-5</span><span class="p">,</span> <span class="n">vec</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
<span class="n">idx</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">idx</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="n">idx</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="mi">2</span><span class="p">,</span> <span class="n">idx</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>
<span class="c1">// sort keys (vec) and swap their corresponding values (idx)</span>
<span class="n">tf</span><span class="o">::</span><span class="n">cudaFlow</span> <span class="n">cf</span><span class="p">;</span>
<span class="n">cf</span><span class="p">.</span><span class="n">sort_by_key</span><span class="p">(</span><span class="n">vec</span><span class="p">,</span> <span class="n">vec</span><span class="o">+</span><span class="n">N</span><span class="p">,</span> <span class="n">idx</span><span class="p">,</span> <span class="p">[]</span> <span class="n">__device__</span> <span class="p">(</span><span class="kt">int</span> <span class="n">a</span><span class="p">,</span> <span class="kt">int</span> <span class="n">b</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">a</span> <span class="o"><</span> <span class="n">b</span><span class="p">;</span> <span class="p">});</span>
<span class="n">cf</span><span class="p">.</span><span class="n">offload</span><span class="p">();</span>
<span class="c1">// now vec = {-5, 1, 2, 4}</span>
<span class="c1">// now idx = { 2, 0, 3, 1}</span>
<span class="c1">// deletes the memory</span>
<span class="n">tf</span><span class="o">::</span><span class="n">cuda_free</span><span class="p">(</span><span class="n">buffer</span><span class="p">);</span>
<span class="n">tf</span><span class="o">::</span><span class="n">cuda_free</span><span class="p">(</span><span class="n">vec</span><span class="p">);</span>
<span class="n">tf</span><span class="o">::</span><span class="n">cuda_free</span><span class="p">(</span><span class="n">idx</span><span class="p">);</span></pre><p>While you can capture the values into the lambda and sort them indirectly using plain <a href="classtf_1_1cudaFlow.html#ae462d455fed06dfcdbd1e25a2c9c5da6" class="m-doc">tf::<wbr />cudaFlow::<wbr />sort</a>, this organization will result in frequent and costly access to the global memory. For example, we can sort <code>idx</code> indirectly using the captured keys in <code>vec:</code></p><pre class="m-code"><span class="n">cf</span><span class="p">.</span><span class="n">sort</span><span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="n">idx</span><span class="o">+</span><span class="n">N</span><span class="p">,</span> <span class="p">[</span><span class="n">vec</span><span class="p">]</span> <span class="n">__device__</span> <span class="p">(</span><span class="kt">int</span> <span class="n">a</span><span class="p">,</span> <span class="kt">int</span> <span class="n">b</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">vec</span><span class="p">[</span><span class="n">a</span><span class="p">]</span> <span class="o"><</span> <span class="n">vec</span><span class="p">[</span><span class="n">b</span><span class="p">];</span> <span class="p">});</span></pre><p>The comparator here will frequently access the global memory of <code>vec</code>, resulting in high memory latency. Instead, you should use <a href="classtf_1_1cudaFlow.html#a979739fcf70fbd760ad1a7682a8dfea8" class="m-doc">tf::<wbr />cudaFlow::<wbr />sort_by_key</a> that has been optimized for this purpose.</p></section><section id="cudaFlowSortMiscellaneousItems"><h2><a href="#cudaFlowSortMiscellaneousItems">Miscellaneous Items</a></h2><p>Parallel sort algorithms are also available in <a href="classtf_1_1cudaFlowCapturer.html" class="m-doc">tf::<wbr />cudaFlowCapturer</a> with the same API.</p></section>
</div>
</div>
</div>
</article></main>
<div class="m-doc-search" id="search">
<a href="#!" onclick="return hideSearch()"></a>
<div class="m-container">
<div class="m-row">
<div class="m-col-m-8 m-push-m-2">
<div class="m-doc-search-header m-text m-small">
<div><span class="m-label m-default">Tab</span> / <span class="m-label m-default">T</span> to search, <span class="m-label m-default">Esc</span> to close</div>
<div id="search-symbolcount">…</div>
</div>
<div class="m-doc-search-content">
<form>
<input type="search" name="q" id="search-input" placeholder="Loading …" disabled="disabled" autofocus="autofocus" autocomplete="off" spellcheck="false" />
</form>
<noscript class="m-text m-danger m-text-center">Unlike everything else in the docs, the search functionality <em>requires</em> JavaScript.</noscript>
<div id="search-help" class="m-text m-dim m-text-center">
<p class="m-noindent">Search for symbols, directories, files, pages or
modules. You can omit any prefix from the symbol or file path; adding a
<code>:</code> or <code>/</code> suffix lists all members of given symbol or
directory.</p>
<p class="m-noindent">Use <span class="m-label m-dim">↓</span>
/ <span class="m-label m-dim">↑</span> to navigate through the list,
<span class="m-label m-dim">Enter</span> to go.
<span class="m-label m-dim">Tab</span> autocompletes common prefix, you can
copy a link to the result using <span class="m-label m-dim">⌘</span>
<span class="m-label m-dim">L</span> while <span class="m-label m-dim">⌘</span>
<span class="m-label m-dim">M</span> produces a Markdown link.</p>
</div>
<div id="search-notfound" class="m-text m-warning m-text-center">Sorry, nothing was found.</div>
<ul id="search-results"></ul>
</div>
</div>
</div>
</div>
</div>
<script src="search-v1.js"></script>
<script src="searchdata-v1.js" async="async"></script>
<footer><nav>
<div class="m-container">
<div class="m-row">
<div class="m-col-l-10 m-push-l-1">
<p>Taskflow handbook is part of the <a href="https://taskflow.github.io">Taskflow project</a>, copyright © <a href="https://tsung-wei-huang.github.io/">Dr. Tsung-Wei Huang</a>, 2018–2022.<br />Generated by <a href="https://doxygen.org/">Doxygen</a> 1.8.14 and <a href="https://mcss.mosra.cz/">m.css</a>.</p>
</div>
</div>
</div>
</nav></footer>
</body>
</html>