taskflow/docs/cudaFlowSort.html at master · simoncpp/taskflow

History

157 lines (147 loc) · 18.8 KB

Raw

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

<!DOCTYPE html>

<head>

<title>cudaFlow Algorithms » Parallel Sort | Taskflow QuickStart</title>

</head>

<body>

<a href="https://taskflow.github.io"><img src="taskflow_logo.png" alt="" />Taskflow</a> | <a href="index.html" class="m-thin">QuickStart</a>

</svg></a>

</div>

<li><a href="pages.html">Handbook</a></li>

<li><a href="namespaces.html">Namespaces</a></li>

</ol>

<li><a href="annotated.html">Classes</a></li>

<li><a href="files.html">Files</a></li>

</svg></a></li>

</ol>

</div>

</nav></header>

<h1>

<a href="cudaFlowAlgorithms.html">cudaFlow Algorithms</a> »

Parallel Sort

</h1>

<h3>Contents</h3>

<ul>

<li><a href="#CUDAParallelSortIncludeTheHeader">Include the Header</a></li>

<li><a href="#cudaFlowSortARangeofItems">Sort a Range of Items</a></li>

<li><a href="#cudaFlowSortKeyValueItems">Sort a Range of Key-Value Items</a></li>

<li><a href="#cudaFlowSortMiscellaneousItems">Miscellaneous Items</a></li>

</ul>

</div>

cudaFlow provides template methods to create parallel sort tasks on a CUDA GPU.<section id="CUDAParallelSortIncludeTheHeader"><h2><a href="#CUDAParallelSortIncludeTheHeader">Include the Header</a></h2>You need to include the header file, <code>taskflow/cuda/algorithm/sort.hpp</code>, for creating a parallel-sort task.</section><section id="cudaFlowSortARangeofItems"><h2><a href="#cudaFlowSortARangeofItems">Sort a Range of Items</a></h2><a href="classtf_1_1cudaFlow.html#ae462d455fed06dfcdbd1e25a2c9c5da6" class="m-doc">tf::cudaFlow::sort</a> performs an in-place parallel sort over a range of elements specified by <code>[first, last)</code> using the given comparator. The following code sorts one million random integers in an increasing order on a GPU.<pre class="m-code">const size_t N = 1000000;

int* vec = tf::cuda_malloc_shared<int>(N); // vector

// initializes the data

for(size_t i=0; i<N; vec[i++]=rand());

// create a cudaFlow of one task to perform parallel sort

tf::cudaFlow cf;

tf::cudaTask task = cf.sort(

vec, vec + N, []__device__(int a, int b) { return a < b; }

);

cf.offload();

// initializes the data

// create a cudaFlow of one task to perform parallel sort

tf::cudaFlow cf;

vec, vec + N, [] __device__ (int a, int b) { return a > b; }

);

cf.offload();

assert(std::is_sorted(vec, vec+N, [](int a, int b){ return a > b; }));</pre></section><section id="cudaFlowSortKeyValueItems"><h2><a href="#cudaFlowSortKeyValueItems">Sort a Range of Key-Value Items</a></h2><a href="classtf_1_1cudaFlow.html#a979739fcf70fbd760ad1a7682a8dfea8" class="m-doc">tf::cudaFlow::sort_by_key</a> sorts a range of key-value items into ascending key order. If <code>i</code> and <code>j</code> are any two valid iterators in <code>[k_first, k_last)</code> such that <code>i</code> precedes <code>j</code>, and <code>p</code> and <code>q</code> are iterators in <code>[v_first, v_first + (k_last - k_first))</code> corresponding to <code>i</code> and <code>j</code> respectively, then <code>comp(*j, *i)</code> evaluates to <code>false</code>. The following example sorts a range of items into ascending key order and swaps their corresponding values:<pre class="m-code">const size_t N = 4;

auto vec = tf::cuda_malloc_shared<int>(N); // keys

auto idx = tf::cuda_malloc_shared<int>(N); // values

// initializes the data

vec[0] = 1, vec[1] = 4, vec[2] = -5, vec[3] = 2;

idx[0] = 0, idx[1] = 1, idx[2] = 2, idx[3] = 3;

// sort keys (vec) and swap their corresponding values (idx)

tf::cudaFlow cf;

cf.sort_by_key(vec, vec+N, idx, [] __device__ (int a, int b) { return a < b; });

cf.offload();

// now vec = {-5, 1, 2, 4}

// now idx = { 2, 0, 3, 1}

// deletes the memory

tf::cuda_free(buffer);

tf::cuda_free(vec);

tf::cuda_free(idx);</pre>While you can capture the values into the lambda and sort them indirectly using plain <a href="classtf_1_1cudaFlow.html#ae462d455fed06dfcdbd1e25a2c9c5da6" class="m-doc">tf::cudaFlow::sort</a>, this organization will result in frequent and costly access to the global memory. For example, we can sort <code>idx</code> indirectly using the captured keys in <code>vec:</code><pre class="m-code">cf.sort(idx, idx+N, [vec] __device__ (int a, int b) { return vec[a] < vec[b]; });</pre>The comparator here will frequently access the global memory of <code>vec</code>, resulting in high memory latency. Instead, you should use <a href="classtf_1_1cudaFlow.html#a979739fcf70fbd760ad1a7682a8dfea8" class="m-doc">tf::cudaFlow::sort_by_key</a> that has been optimized for this purpose.</section><section id="cudaFlowSortMiscellaneousItems"><h2><a href="#cudaFlowSortMiscellaneousItems">Miscellaneous Items</a></h2>Parallel sort algorithms are also available in <a href="classtf_1_1cudaFlowCapturer.html" class="m-doc">tf::cudaFlowCapturer</a> with the same API.</section>

</div>

</article></main>

<div>Tab / T to search, Esc to close</div>

</div>

<form>

</form>

Search for symbols, directories, files, pages or

modules. You can omit any prefix from the symbol or file path; adding a

<code>:</code> or <code>/</code> suffix lists all members of given symbol or

directory.

Use ↓

/ ↑ to navigate through the list,

Enter to go.

Tab autocompletes common prefix, you can

copy a link to the result using ⌘

L while ⌘

M produces a Markdown link.

</div>

<div id="search-notfound" class="m-text m-warning m-text-center">Sorry, nothing was found.</div>

</div>

Taskflow handbook is part of the <a href="https://taskflow.github.io">Taskflow project</a>, copyright © <a href="https://tsung-wei-huang.github.io/">Dr. Tsung-Wei Huang</a>, 2018–2022. Generated by <a href="https://doxygen.org/">Doxygen</a> 1.8.14 and <a href="https://mcss.mosra.cz/">m.css</a>.

</div>

</nav></footer>

</body>

</html>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

cudaFlowSort.html

Latest commit

History

cudaFlowSort.html

File metadata and controls