diff options
author | David Gross <dgross@google.com> | 2016-07-21 11:43:15 -0700 |
---|---|---|
committer | David Gross <dgross@google.com> | 2016-07-21 11:43:15 -0700 |
commit | 95e42d97f4b5150db2dcf1c26b3b50e6e133d1e4 (patch) | |
tree | 01784b7fc552ea36444ccdd9a2a4c27bd8cd8c80 /docs/html/guide/topics/renderscript/compute.jd | |
parent | 8f7223e420785962487995baad89f654fa1cc8ce (diff) | |
parent | ed6625d9074a1de1515c97faf201a91fbb3abb27 (diff) |
resolve merge conflicts of ed6625d to stage-aosp-master
Change-Id: Icae5872d5e220ac18a35e338f10b194c286855a8
Diffstat (limited to 'docs/html/guide/topics/renderscript/compute.jd')
-rwxr-xr-x | docs/html/guide/topics/renderscript/compute.jd | 1161 |
1 files changed, 1075 insertions, 86 deletions
diff --git a/docs/html/guide/topics/renderscript/compute.jd b/docs/html/guide/topics/renderscript/compute.jd index fc795ff6a63f..c5b49d70435f 100755 --- a/docs/html/guide/topics/renderscript/compute.jd +++ b/docs/html/guide/topics/renderscript/compute.jd @@ -16,6 +16,13 @@ parent.link=index.html </ol> </li> <li><a href="#using-rs-from-java">Using RenderScript from Java Code</a></li> + <li><a href="#reduction-in-depth">Reduction Kernels in Depth</a> + <ol> + <li><a href="#writing-reduction-kernel">Writing a reduction kernel</a></li> + <li><a href="#calling-reduction-kernel">Calling a reduction kernel from Java code</a></li> + <li><a href="#more-example">More example reduction kernels</a></li> + </ol> + </li> </ol> <h2>Related Samples</h2> @@ -29,16 +36,18 @@ parent.link=index.html <p>RenderScript is a framework for running computationally intensive tasks at high performance on Android. RenderScript is primarily oriented for use with data-parallel computation, although serial -computationally intensive workloads can benefit as well. The RenderScript runtime will parallelize -work across all processors available on a device, such as multi-core CPUs, GPUs, or DSPs, allowing -you to focus on expressing algorithms rather than scheduling work or load balancing. RenderScript is +workloads can benefit as well. The RenderScript runtime parallelizes +work across processors available on a device, such as multi-core CPUs and GPUs. This allows +you to focus on expressing algorithms rather than scheduling work. RenderScript is especially useful for applications performing image processing, computational photography, or computer vision.</p> <p>To begin with RenderScript, there are two main concepts you should understand:</p> <ul> -<li>High-performance compute kernels are written in a C99-derived language.</li> +<li>High-performance compute kernels are written in a C99-derived language. A <i>compute + kernel</i> is a function or collection of functions that you can direct the RenderScript runtime + to execute in parallel across a collection of data.</li> <li>A Java API is used for managing the lifetime of RenderScript resources and controlling kernel execution.</li> @@ -48,7 +57,7 @@ execution.</li> <p>A RenderScript kernel typically resides in a <code>.rs</code> file in the <code><project_root>/src/</code> directory; each <code>.rs</code> file is called a -script. Every script contains its own set of kernels, functions, and variables. A script can +<i>script</i>. Every script contains its own set of kernels, functions, and variables. A script can contain:</p> <ul> @@ -57,23 +66,32 @@ RenderScript kernel language used in this script. Currently, 1 is the only valid <li>A pragma declaration (<code>#pragma rs java_package_name(com.example.app)</code>) that declares the package name of the Java classes reflected from this script. -Note that your .rs file must be part of your application package, and not in a +Note that your <code>.rs</code> file must be part of your application package, and not in a library project.</li> -<li>Some number of invokable functions. An invokable function is a single-threaded RenderScript +<li>Zero or more <strong><i>invokable functions</i></strong>. An invokable function is a single-threaded RenderScript function that you can call from your Java code with arbitrary arguments. These are often useful for initial setup or serial computations within a larger processing pipeline.</li> -<li>Some number of script globals. A script global is equivalent to a global variable in C. You can +<li><p>Zero or more <strong><i>script globals</i></strong>. A script global is equivalent to a global variable in C. You can access script globals from Java code, and these are often used for parameter passing to RenderScript -kernels.</li> +kernels.</p></li> -<li>Some number of compute kernels. A kernel is a parallel function that executes across every -{@link android.renderscript.Element} within an {@link android.renderscript.Allocation}. +<li><p>Zero or more <strong><i>compute kernels</i></strong>. There are two kinds of compute +kernels: <i>mapping</i> kernels (also called <i>foreach</i> kernels) +and <i>reduction</i> kernels.</p> -<p>A simple kernel may look like the following:</p> +<p>A <em>mapping kernel</em> is a parallel function that operates on a collection of {@link + android.renderscript.Allocation Allocations} of the same dimensions. By default, it executes + once for every coordinate in those dimensions. It is typically (but not exclusively) used to + transform a collection of input {@link android.renderscript.Allocation Allocations} to an + output {@link android.renderscript.Allocation} one {@link android.renderscript.Element} at a + time.</p> -<pre>uchar4 __attribute__((kernel)) invert(uchar4 in, uint32_t x, uint32_t y) { +<ul> +<li><p>Here is an example of a simple <strong>mapping kernel</strong>:</p> + +<pre>uchar4 RS_KERNEL invert(uchar4 in, uint32_t x, uint32_t y) { uchar4 out = in; out.r = 255 - in.r; out.g = 255 - in.g; @@ -81,40 +99,113 @@ kernels.</li> return out; }</pre> -<p>In most respects, this is identical to a standard C function. The first notable feature is the -<code>__attribute__((kernel))</code> applied to the function prototype. This denotes that the -function is a RenderScript kernel instead of an invokable function. The next feature is the -<code>in</code> argument and its type. In a RenderScript kernel, this is a special argument that is -automatically filled in based on the input {@link android.renderscript.Allocation} passed to the -kernel launch. By default, the kernel is run across an entire {@link -android.renderscript.Allocation}, with one execution of the kernel body per {@link -android.renderscript.Element} in the {@link android.renderscript.Allocation}. The third notable -feature is the return type of the kernel. The value returned from the kernel is automatically -written to the appropriate location in the output {@link android.renderscript.Allocation}. The -RenderScript runtime checks to ensure that the {@link android.renderscript.Element} types of the -input and output Allocations match the kernel's prototype; if they do not match, an exception is -thrown.</p> - -<p>A kernel may have an input {@link android.renderscript.Allocation}, an output {@link -android.renderscript.Allocation}, or both. A kernel may not have more than one input or one output -{@link android.renderscript.Allocation}. If more than one input or output is required, those objects -should be bound to <code>rs_allocation</code> script globals and accessed from a kernel or invokable -function via <code>rsGetElementAt_<em>type</em>()</code> or -<code>rsSetElementAt_<em>type</em>()</code>.</p> - -<p>A kernel may access the coordinates of the current execution using the <code>x</code>, -<code>y</code>, and <code>z</code> arguments. These arguments are optional, but the type of the -coordinate arguments must be <code>uint32_t</code>.</p></li> +<p>In most respects, this is identical to a standard C + function. The <a href="#RS_KERNEL"><code>RS_KERNEL</code></a> property applied to the + function prototype specifies that the function is a RenderScript mapping kernel instead of an + invokable function. The <code>in</code> argument is automatically filled in based on the + input {@link android.renderscript.Allocation} passed to the kernel launch. The + arguments <code>x</code> and <code>y</code> are + discussed <a href="#special-arguments">below</a>. The value returned from the kernel is + automatically written to the appropriate location in the output {@link + android.renderscript.Allocation}. By default, this kernel is run across its entire input + {@link android.renderscript.Allocation}, with one execution of the kernel function per {@link + android.renderscript.Element} in the {@link android.renderscript.Allocation}.</p> + +<p>A mapping kernel may have one or more input {@link android.renderscript.Allocation + Allocations}, a single output {@link android.renderscript.Allocation}, or both. The + RenderScript runtime checks to ensure that all input and output Allocations have the same + dimensions, and that the {@link android.renderscript.Element} types of the input and output + Allocations match the kernel's prototype; if either of these checks fails, RenderScript + throws an exception.</p> + +<p class="note"><strong>NOTE:</strong> Before Android 6.0 (API level 23), a mapping kernel may + not have more than one input {@link android.renderscript.Allocation}.</p> + +<p>If you need more input or output {@link android.renderscript.Allocation Allocations} than + the kernel has, those objects should be bound to <code>rs_allocation</code> script globals + and accessed from a kernel or invokable function + via <code>rsGetElementAt_<i>type</i>()</code> or <code>rsSetElementAt_<i>type</i>()</code>.</p> + +<p><strong>NOTE:</strong> <a id="RS_KERNEL"><code>RS_KERNEL</code></a> is a macro + defined automatically by RenderScript for your convenience:</p> +<pre> +#define RS_KERNEL __attribute__((kernel)) +</pre> +</li> +</ul> + +<p>A <em>reduction kernel</em> is a family of functions that operates on a collection of input + {@link android.renderscript.Allocation Allocations} of the same dimensions. By default, + its <a href="#accumulator-function">accumulator function</a> executes once for every + coordinate in those dimensions. It is typically (but not exclusively) used to "reduce" a + collection of input {@link android.renderscript.Allocation Allocations} to a single + value.</p> + +<ul> +<li><p>Here is an <a id="example-addint">example</a> of a simple <strong>reduction +kernel</strong> that adds up the {@link android.renderscript.Element Elements} of its +input:</p> + +<pre>#pragma rs reduce(addint) accumulator(addintAccum) + +static void addintAccum(int *accum, int val) { + *accum += val; +}</pre> + +<p>A reduction kernel consists of one or more user-written functions. +<code>#pragma rs reduce</code> is used to define the kernel by specifying its name +(<code>addint</code>, in this example) and the names and roles of the functions that make +up the kernel (an <code>accumulator</code> function <code>addintAccum</code>, in this +example). All such functions must be <code>static</code>. A reduction kernel always +requires an <code>accumulator</code> function; it may also have other functions, depending +on what you want the kernel to do.</p> + +<p>A reduction kernel accumulator function must return <code>void</code> and must have at least +two arguments. The first argument (<code>accum</code>, in this example) is a pointer to +an <i>accumulator data item</i> and the second (<code>val</code>, in this example) is +automatically filled in based on the input {@link android.renderscript.Allocation} passed to +the kernel launch. The accumulator data item is created by the RenderScript runtime; by +default, it is initialized to zero. By default, this kernel is run across its entire input +{@link android.renderscript.Allocation}, with one execution of the accumulator function per +{@link android.renderscript.Element} in the {@link android.renderscript.Allocation}. By +default, the final value of the accumulator data item is treated as the result of the +reduction, and is returned to Java. The RenderScript runtime checks to ensure that the {@link +android.renderscript.Element} type of the input Allocation matches the accumulator function's +prototype; if it does not match, RenderScript throws an exception.</p> + +<p>A reduction kernel has one or more input {@link android.renderscript.Allocation +Allocations} but no output {@link android.renderscript.Allocation Allocations}.</p></li> + +<p>Reduction kernels are explained in more detail <a href="#reduction-in-depth">here</a>.</p> + +<p>Reduction kernels are supported in Android Nougat (API level 24) and later.</p> +</li> +</ul> + +<p>A mapping kernel function or a reduction kernel accumulator function may access the coordinates +of the current execution using the <a id="special-arguments">special arguments</a> <code>x</code>, +<code>y</code>, and <code>z</code>, which must be of type <code>int</code> or <code>uint32_t</code>. +These arguments are optional.</p> + +<p>A mapping kernel function or a reduction kernel accumulator +function may also take the optional special argument +<code>context</code> of type <a +href='reference/rs_for_each.html#android_rs:rs_kernel_context'>rs_kernel_context</a>. +It is needed by a family of runtime APIs that are used to query +certain properties of the current execution -- for example, <a +href='reference/rs_for_each.html#android_rs:rsGetDimX'>rsGetDimX</a>. +(The <code>context</code> argument is available in Android 6.0 (API level 23) and later.)</p> +</li> <li>An optional <code>init()</code> function. An <code>init()</code> function is a special type of -invokable function that is run when the script is first instantiated. This allows for some +invokable function that RenderScript runs when the script is first instantiated. This allows for some computation to occur automatically at script creation.</li> -<li>Some number of static script globals and functions. A static script global is equivalent to a -script global except that it cannot be set from Java code. A static function is a standard C +<li>Zero or more <strong><i>static script globals and functions</i></strong>. A static script global is equivalent to a +script global except that it cannot be accessed from Java code. A static function is a standard C function that can be called from any kernel or invokable function in the script but is not exposed to the Java API. If a script global or function does not need to be called from Java code, it is -highly recommended that those be declared <code>static</code>.</li> </ul> +highly recommended that it be declared <code>static</code>.</li> </ul> <h4>Setting floating point precision</h4> @@ -129,13 +220,13 @@ different level of floating point precision:</p> </li> - <li><code>#pragma rs_fp_relaxed</code> - For apps that don’t require strict IEEE 754-2008 + <li><code>#pragma rs_fp_relaxed</code>: For apps that don’t require strict IEEE 754-2008 compliance and can tolerate less precision. This mode enables flush-to-zero for denorms and round-towards-zero. </li> - <li><code>#pragma rs_fp_imprecise</code> - For apps that don’t have stringent precision + <li><code>#pragma rs_fp_imprecise</code>: For apps that don’t have stringent precision requirements. This mode enables everything in <code>rs_fp_relaxed</code> along with the following: @@ -162,14 +253,21 @@ precision (such as SIMD CPU instructions).</p> available on devices running Android 3.0 (API level 11) and higher. </li> <li><strong>{@link android.support.v8.renderscript}</strong> - The APIs in this package are available through a <a href="{@docRoot}tools/support-library/features.html#v8">Support - Library</a>, which allows you to use them on devices running Android 2.2 (API level 8) and + Library</a>, which allows you to use them on devices running Android 2.3 (API level 9) and higher.</li> </ul> -<p>We strongly recommend using the Support Library APIs for accessing RenderScript because they - provide a wider range of device compatibility. Developers targeting specific versions of - Android can use {@link android.renderscript} if necessary.</p> +<p>Here are the tradeoffs:</p> +<ul> +<li>If you use the Support Library APIs, the RenderScript portion of your application will be + compatible with devices running Android 2.3 (API level 9) and higher, regardless of which RenderScript + features you use. This allows your application to work on more devices than if you use the + native (<strong>{@link android.renderscript}</strong>) APIs.</li> +<li>Certain RenderScript features are not available through the Support Library APIs.</li> +<li>If you use the Support Library APIs, you will get (possibly significantly) larger APKs than +if you use the native (<strong>{@link android.renderscript}</strong>) APIs.</li> +</ul> <h3 id="ide-setup">Using the RenderScript Support Library APIs</h3> @@ -202,7 +300,7 @@ android { buildToolsVersion "23.0.3" defaultConfig { - minSdkVersion 8 + minSdkVersion 9 targetSdkVersion 19 <strong> renderscriptTargetApi 18 @@ -250,7 +348,7 @@ import android.support.v8.renderscript.*; <p>Using RenderScript from Java code relies on the API classes located in the {@link android.renderscript} or the {@link android.support.v8.renderscript} package. Most -applications follow the same basic usage patterns:</p> +applications follow the same basic usage pattern:</p> <ol> @@ -266,12 +364,12 @@ possible. Typically, an application will have only a single RenderScript context script.</strong> An {@link android.renderscript.Allocation} is a RenderScript object that provides storage for a fixed amount of data. Kernels in scripts take {@link android.renderscript.Allocation} objects as their input and output, and {@link android.renderscript.Allocation} objects can be -accessed in kernels using <code>rsGetElementAt_<em>type</em>()</code> and -<code>rsSetElementAt_<em>type</em>()</code> when bound as script globals. {@link +accessed in kernels using <code>rsGetElementAt_<i>type</i>()</code> and +<code>rsSetElementAt_<i>type</i>()</code> when bound as script globals. {@link android.renderscript.Allocation} objects allow arrays to be passed from Java code to RenderScript code and vice-versa. {@link android.renderscript.Allocation} objects are typically created using -{@link android.renderscript.Allocation#createTyped} or {@link -android.renderscript.Allocation#createFromBitmap}.</li> +{@link android.renderscript.Allocation#createTyped createTyped()} or {@link +android.renderscript.Allocation#createFromBitmap createFromBitmap()}.</li> <li><strong>Create whatever scripts are necessary.</strong> There are two types of scripts available to you when using RenderScript: @@ -281,9 +379,9 @@ to you when using RenderScript: <li><strong>ScriptC</strong>: These are the user-defined scripts as described in <a href="#writing-an-rs-kernel">Writing a RenderScript Kernel</a> above. Every script has a Java class reflected by the RenderScript compiler in order to make it easy to access the script from Java code; -this class will have the name <code>ScriptC_<em>filename</em></code>. For example, if the kernel -above was located in <code>invert.rs</code> and a RenderScript context was already located in -<code>mRS</code>, the Java code to instantiate the script would be: +this class has the name <code>ScriptC_<i>filename</i></code>. For example, if the mapping kernel +above were located in <code>invert.rs</code> and a RenderScript context were already located in +<code>mRenderScript</code>, the Java code to instantiate the script would be: <pre>ScriptC_invert invert = new ScriptC_invert(mRenderScript);</pre></li> @@ -294,35 +392,926 @@ such as Gaussian blur, convolution, and image blending. For more information, se </ul></li> <li><strong>Populate Allocations with data.</strong> Except for Allocations created with {@link -android.renderscript#createFromBitmap}, an Allocation will be populated with empty data when it is -first created. To populate an Allocation, use one of the <code>copy</code> methods in {@link -android.renderscript.Allocation}.</li> - -<li><strong>Set any necessary script globals.</strong> Globals may be set using methods in the same -<code>ScriptC_<em>filename</em></code> class with methods named -<code>set_<em>globalname</em></code>. For example, in order to set an <code>int</code> named -<code>elements</code>, use the Java method <code>set_elements(int)</code>. RenderScript objects can -also be set in kernels; for example, the <code>rs_allocation</code> variable named -<code>lookup</code> can be set with the method <code>set_lookup(Allocation)</code>.</li> - -<li><strong>Launch the appropriate kernels.</strong> Methods to launch a given kernel will be -reflected in the same <code>ScriptC_<em>filename</em></code> class with methods named -<code>forEach_<em>kernelname</em>()</code>. These launches are asynchronous, and launches will be -serialized in the order in which they are launched. Depending on the arguments to the kernel, the -method will take either one or two Allocations. By default, a kernel will execute over the entire -input or output Allocation; to execute over a subset of that Allocation, pass an appropriate {@link -android.renderscript.Script.LaunchOptions} as the last argument to the <code>forEach</code> method. - -<p>Invoked functions can be launched using the <code>invoke_<em>functionname</em></code> methods -reflected in the same <code>ScriptC_<em>filename</em></code> class.</p></li> - -<li><strong>Copy data out of {@link android.renderscript.Allocation} objects.</strong> In order to -access data from an {@link android.renderscript.Allocation} from Java code, that data must be copied -back to Java buffers using one of the <code>copy</code> methods in {@link -android.renderscript.Allocation}. These functions will synchronize with asynchronous kernel and -function launches as necessary.</li> - -<li><strong>Tear down the RenderScript context.</strong> The RenderScript context can be destroyed +android.renderscript.Allocation#createFromBitmap createFromBitmap()}, an Allocation is populated with empty data when it is +first created. To populate an Allocation, use one of the "copy" methods in {@link +android.renderscript.Allocation}. The "copy" methods are <a href="#asynchronous-model">synchronous</a>.</li> + +<li><strong>Set any necessary script globals.</strong> You may set globals using methods in the + same <code>ScriptC_<i>filename</i></code> class named <code>set_<i>globalname</i></code>. For + example, in order to set an <code>int</code> variable named <code>threshold</code>, use the + Java method <code>set_threshold(int)</code>; and in order to set + an <code>rs_allocation</code> variable named <code>lookup</code>, use the Java + method <code>set_lookup(Allocation)</code>. The <code>set</code> methods + are <a href="#asynchronous-model">asynchronous</a>.</li> + +<li><strong>Launch the appropriate kernels and invokable functions.</strong> +<p>Methods to launch a given kernel are +reflected in the same <code>ScriptC_<i>filename</i></code> class with methods named +<code>forEach_<i>mappingKernelName</i>()</code> +or <code>reduce_<i>reductionKernelName</i>()</code>. +These launches are <a href="#asynchronous-model">asynchronous</a>. +Depending on the arguments to the kernel, the +method takes one or more Allocations, all of which must have the same dimensions. By default, a +kernel executes over every coordinate in those dimensions; to execute a kernel over a subset of those coordinates, +pass an appropriate {@link +android.renderscript.Script.LaunchOptions} as the last argument to the <code>forEach</code> or <code>reduce</code> method.</p> + +<p>Launch invokable functions using the <code>invoke_<i>functionName</i></code> methods +reflected in the same <code>ScriptC_<i>filename</i></code> class. +These launches are <a href="#asynchronous-model">asynchronous</a>.</p></li> + +<li><strong>Retrieve data from {@link android.renderscript.Allocation} objects +and <i><a href="#javaFutureType">javaFutureType</a></i> objects.</strong> +In order to +access data from an {@link android.renderscript.Allocation} from Java code, you must copy that data +back to Java using one of the "copy" methods in {@link +android.renderscript.Allocation}. +In order to obtain the result of a reduction kernel, you must use the <code><i>javaFutureType</i>.get()</code> method. +The "copy" and <code>get()</code> methods are <a href="#asynchronous-model">synchronous</a>.</li> + +<li><strong>Tear down the RenderScript context.</strong> You can destroy the RenderScript context with {@link android.renderscript.RenderScript#destroy} or by allowing the RenderScript context -object to be garbage collected. This will cause any further use of any object belonging to that +object to be garbage collected. This causes any further use of any object belonging to that context to throw an exception.</li> </ol> + +<h3 id="asynchronous-model">Asynchronous execution model</h3> + +<p>The reflected <code>forEach</code>, <code>invoke</code>, <code>reduce</code>, + and <code>set</code> methods are asynchronous -- each may return to Java before completing the + requested action. However, the individual actions are serialized in the order in which they are launched.</p> + +<p>The {@link android.renderscript.Allocation} class provides "copy" methods to copy data to + and from Allocations. A "copy" method is synchronous, and is serialized with respect to any + of the asynchronous actions above that touch the same Allocation.</p> + +<p>The reflected <i><a href="#javaFutureType">javaFutureType</a></i> classes provide + a <code>get()</code> method to obtain the result of a reduction. <code>get()</code> is + synchronous, and is serialized with respect to the reduction (which is asynchronous).</p> + +<h2 id="reduction-in-depth">Reduction Kernels in Depth</h2> + +<p><i>Reduction</i> is the process of combining a collection of data into a single +value. This is a useful primitive in parallel programming, with applications such as the +following:</p> +<ul> + <li>computing the sum or product over all the data</li> + <li>computing logical operations (<code>and</code>, <code>or</code>, <code>xor</code>) + over all the data</li> + <li>finding the minimum or maximum value within the data</li> + <li>searching for a specific value or for the coordinate of a specific value within the data</li> +</ul> + +<p>In Android Nougat (API level 24) and later, RenderScript supports <i>reduction kernels</i> to allow +efficient user-written reduction algorithms. You may launch reduction kernels on inputs with +1, 2, or 3 dimensions.<p> + +<p>An example above shows a simple <a href="#example-addint">addint</a> reduction kernel. +Here is a more complicated <a id="example-findMinAndMax">findMinAndMax</a> reduction kernel +that finds the locations of the minimum and maximum <code>long</code> values in a +1-dimensional {@link android.renderscript.Allocation}:</p> + +<pre> +#define LONG_MAX (long)((1UL << 63) - 1) +#define LONG_MIN (long)(1UL << 63) + +#pragma rs reduce(findMinAndMax) \ + initializer(fMMInit) accumulator(fMMAccumulator) \ + combiner(fMMCombiner) outconverter(fMMOutConverter) + +// Either a value and the location where it was found, or <a href="#INITVAL">INITVAL</a>. +typedef struct { + long val; + int idx; // -1 indicates <a href="#INITVAL">INITVAL</a> +} IndexedVal; + +typedef struct { + IndexedVal min, max; +} MinAndMax; + +// In discussion below, this initial value { { LONG_MAX, -1 }, { LONG_MIN, -1 } } +// is called <a id="INITVAL">INITVAL</a>. +static void fMMInit(MinAndMax *accum) { + accum->min.val = LONG_MAX; + accum->min.idx = -1; + accum->max.val = LONG_MIN; + accum->max.idx = -1; +} + +//---------------------------------------------------------------------- +// In describing the behavior of the accumulator and combiner functions, +// it is helpful to describe hypothetical functions +// IndexedVal min(IndexedVal a, IndexedVal b) +// IndexedVal max(IndexedVal a, IndexedVal b) +// MinAndMax minmax(MinAndMax a, MinAndMax b) +// MinAndMax minmax(MinAndMax accum, IndexedVal val) +// +// The effect of +// IndexedVal min(IndexedVal a, IndexedVal b) +// is to return the IndexedVal from among the two arguments +// whose val is lesser, except that when an IndexedVal +// has a negative index, that IndexedVal is never less than +// any other IndexedVal; therefore, if exactly one of the +// two arguments has a negative index, the min is the other +// argument. Like ordinary arithmetic min and max, this function +// is commutative and associative; that is, +// +// min(A, B) == min(B, A) // commutative +// min(A, min(B, C)) == min((A, B), C) // associative +// +// The effect of +// IndexedVal max(IndexedVal a, IndexedVal b) +// is analogous (greater . . . never greater than). +// +// Then there is +// +// MinAndMax minmax(MinAndMax a, MinAndMax b) { +// return MinAndMax(min(a.min, b.min), max(a.max, b.max)); +// } +// +// Like ordinary arithmetic min and max, the above function +// is commutative and associative; that is: +// +// minmax(A, B) == minmax(B, A) // commutative +// minmax(A, minmax(B, C)) == minmax((A, B), C) // associative +// +// Finally define +// +// MinAndMax minmax(MinAndMax accum, IndexedVal val) { +// return minmax(accum, MinAndMax(val, val)); +// } +//---------------------------------------------------------------------- + +// This function can be explained as doing: +// *accum = minmax(*accum, IndexedVal(in, x)) +// +// This function simply computes minimum and maximum values as if +// INITVAL.min were greater than any other minimum value and +// INITVAL.max were less than any other maximum value. Note that if +// *accum is INITVAL, then this function sets +// *accum = IndexedVal(in, x) +// +// After this function is called, both accum->min.idx and accum->max.idx +// will have nonnegative values: +// - x is always nonnegative, so if this function ever sets one of the +// idx fields, it will set it to a nonnegative value +// - if one of the idx fields is negative, then the corresponding +// val field must be LONG_MAX or LONG_MIN, so the function will always +// set both the val and idx fields +static void fMMAccumulator(MinAndMax *accum, long in, int x) { + IndexedVal me; + me.val = in; + me.idx = x; + + if (me.val <= accum->min.val) + accum->min = me; + if (me.val >= accum->max.val) + accum->max = me; +} + +// This function can be explained as doing: +// *accum = minmax(*accum, *val) +// +// This function simply computes minimum and maximum values as if +// INITVAL.min were greater than any other minimum value and +// INITVAL.max were less than any other maximum value. Note that if +// one of the two accumulator data items is INITVAL, then this +// function sets *accum to the other one. +static void fMMCombiner(MinAndMax *accum, + const MinAndMax *val) { + if ((accum->min.idx < 0) || (val->min.val < accum->min.val)) + accum->min = val->min; + if ((accum->max.idx < 0) || (val->max.val > accum->max.val)) + accum->max = val->max; +} + +static void fMMOutConverter(int2 *result, + const MinAndMax *val) { + result->x = val->min.idx; + result->y = val->max.idx; +} +</pre> + +<p class="note"><strong>NOTE:</strong> There are more example reduction + kernels <a href="#more-example">here</a>.</p> + +<p>In order to run a reduction kernel, the RenderScript runtime creates <em>one or more</em> +variables called <a id="accumulator-data-items"><strong><i>accumulator data +items</i></strong></a> to hold the state of the reduction process. The RenderScript runtime +picks the number of accumulator data items in such a way as to maximize performance. The type +of the accumulator data items (<i>accumType</i>) is determined by the kernel's <i>accumulator +function</i> -- the first argument to that function is a pointer to an accumulator data +item. By default, every accumulator data item is initialized to zero (as if +by <code>memset</code>); however, you may write an <i>initializer function</i> to do something +different.</p> + +<p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a> +kernel, the accumulator data items (of type <code>int</code>) are used to add up input +values. There is no initializer function, so each accumulator data item is initialized to +zero.</p> + +<p class="note"><strong>Example:</strong> In +the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the accumulator data items +(of type <code>MinAndMax</code>) are used to keep track of the minimum and maximum values +found so far. There is an initializer function to set these to <code>LONG_MAX</code> and +<code>LONG_MIN</code>, respectively; and to set the locations of these values to -1, indicating that +the values are not actually present in the (empty) portion of the input that has been +processed.</p> + +<p>RenderScript calls your accumulator function once for every coordinate in the +input(s). Typically, your function should update the accumulator data item in some way +according to the input.</p> + +<p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a> +kernel, the accumulator function adds the value of an input Element to the accumulator +data item.</p> + +<p class="note"><strong>Example:</strong> In +the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the accumulator function +checks to see whether the value of an input Element is less than or equal to the minimum +value recorded in the accumulator data item and/or greater than or equal to the maximum +value recorded in the accumulator data item, and updates the accumulator data item +accordingly.</p> + +<p>After the accumulator function has been called once for every coordinate in the input(s), +RenderScript must <strong>combine</strong> the <a href="#accumulator-data-items">accumulator +data items</a> together into a single accumulator data item. You may write a <i>combiner +function</i> to do this. If the accumulator function has a single input and +no <a href="#special-arguments">special arguments</a>, then you do not need to write a combiner +function; RenderScript will use the accumulator function to combine the accumulator data +items. (You may still write a combiner function if this default behavior is not what you +want.)</p> + +<p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a> +kernel, there is no combiner function, so the accumulator function will be used. This is +the correct behavior, because if we split a collection of values into two pieces, and we +add up the values in those two pieces separately, adding up those two sums is the same as +adding up the entire collection.</p> + +<p class="note"><strong>Example:</strong> In +the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the combiner function +checks to see whether the minimum value recorded in the "source" accumulator data +item <code>*val</code> is less then the minimum value recorded in the "destination" +accumulator data item <code>*accum</code>, and updates <code>*accum</code> +accordingly. It does similar work for the maximum value. This updates <code>*accum</code> +to the state it would have had if all of the input values had been accumulated into +<code>*accum</code> rather than some into <code>*accum</code> and some into +<code>*val</code>.</p> + +<p>After all of the accumulator data items have been combined, RenderScript determines +the result of the reduction to return to Java. You may write an <i>outconverter +function</i> to do this. You do not need to write an outconverter function if you want +the final value of the combined accumulator data items to be the result of the reduction.</p> + +<p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a> kernel, +there is no outconverter function. The final value of the combined data items is the sum of +all Elements of the input, which is the value we want to return.</p> + +<p class="note"><strong>Example:</strong> In +the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the outconverter function +initializes an <code>int2</code> result value to hold the locations of the minimum and +maximum values resulting from the combination of all of the accumulator data items.</p> + +<h3 id="writing-reduction-kernel">Writing a reduction kernel</h3> + +<p><code>#pragma rs reduce</code> defines a reduction kernel by +specifying its name and the names and roles of the functions that make +up the kernel. All such functions must be +<code>static</code>. A reduction kernel always requires an <code>accumulator</code> +function; you can omit some or all of the other functions, depending on what you want the +kernel to do.</p> + +<pre>#pragma rs reduce(<i>kernelName</i>) \ + initializer(<i>initializerName</i>) \ + accumulator(<i>accumulatorName</i>) \ + combiner(<i>combinerName</i>) \ + outconverter(<i>outconverterName</i>) +</pre> + +<p>The meaning of the items in the <code>#pragma</code> is as follows:</p> +<ul> + +<li><code>reduce(<i>kernelName</i>)</code> (mandatory): Specifies that a reduction kernel is +being defined. A reflected Java method <code>reduce_<i>kernelName</i></code> will launch the +kernel.</li> + +<li><p><code>initializer(<i>initializerName</i>)</code> (optional): Specifies the name of the +initializer function for this reduction kernel. When you launch the kernel, RenderScript calls +this function once for each <a href="#accumulator-data-items">accumulator data item</a>. The +function must be defined like this:</p> + +<pre>static void <i>initializerName</i>(<i>accumType</i> *accum) { … }</pre> + +<p><code>accum</code> is a pointer to an accumulator data item for this function to +initialize.</p> + +<p>If you do not provide an initializer function, RenderScript initializes every accumulator +data item to zero (as if by <code>memset</code>), behaving as if there were an initializer +function that looks like this:</p> +<pre>static void <i>initializerName</i>(<i>accumType</i> *accum) { + memset(accum, 0, sizeof(*accum)); +}</pre> +</li> + +<li><p><code><a id="accumulator-function">accumulator(<i>accumulatorName</i>)</a></code> +(mandatory): Specifies the name of the accumulator function for this +reduction kernel. When you launch the kernel, RenderScript calls +this function once for every coordinate in the input(s), to update an +accumulator data item in some way according to the input(s). The function +must be defined like this:</p> + +<pre> +static void <i>accumulatorName</i>(<i>accumType</i> *accum, + <i>in1Type</i> in1, <i>…,</i> <i>inNType</i> in<i>N</i> + <i>[, specialArguments]</i>) { … } +</pre> + +<p><code>accum</code> is a pointer to an accumulator data item for this function to +modify. <code>in1</code> through <code>in<i>N</i></code> are one <em>or more</em> arguments that +are automatically filled in based on the inputs passed to the kernel launch, one argument +per input. The accumulator function may optionally take any of the <a +href="#special-arguments">special arguments</a>.</p> + +<p>An example kernel with multiple inputs is <a href="#dot-product"><code>dotProduct</code></a>.</p> +</li> + +<li><code><a id="combiner-function">combiner(<i>combinerName</i>)</a></code> +(optional): Specifies the name of the combiner function for this +reduction kernel. After RenderScript calls the accumulator function +once for every coordinate in the input(s), it calls this function as many +times as necessary to combine all accumulator data items into a single +accumulator data item. The function must be defined like this:</p> + +<pre>static void <i>combinerName</i>(<i>accumType</i> *accum, const <i>accumType</i> *other) { … }</pre> + +<p><code>accum</code> is a pointer to a "destination" accumulator data item for this +function to modify. <code>other</code> is a pointer to a "source" accumulator data item +for this function to "combine" into <code>*accum</code>.</p> + +<p class="note"><strong>NOTE:</strong> It is possible + that <code>*accum</code>, <code>*other</code>, or both have been initialized but have never + been passed to the accumulator function; that is, one or both have never been updated + according to any input data. For example, in + the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the combiner + function <code>fMMCombiner</code> explicitly checks for <code>idx < 0</code> because that + indicates such an accumulator data item, whose value is <a href="#INITVAL">INITVAL</a>.</p> + +<p>If you do not provide a combiner function, RenderScript uses the accumulator function in its +place, behaving as if there were a combiner function that looks like this:</p> + +<pre>static void <i>combinerName</i>(<i>accumType</i> *accum, const <i>accumType</i> *other) { + <i>accumulatorName</i>(accum, *other); +}</pre> + +<p>A combiner function is mandatory if the kernel has more than one input, if the input data + type is not the same as the accumulator data type, or if the accumulator function takes one + or more <a href="#special-arguments">special arguments</a>.</p> +</li> + +<li><p><code><a id="outconverter-function">outconverter(<i>outconverterName</i>)</a></code> +(optional): Specifies the name of the outconverter function for this +reduction kernel. After RenderScript combines all of the accumulator +data items, it calls this function to determine the result of the +reduction to return to Java. The function must be defined like +this:</p> + +<pre>static void <i>outconverterName</i>(<i>resultType</i> *result, const <i>accumType</i> *accum) { … }</pre> + +<p><code>result</code> is a pointer to a result data item (allocated but not initialized +by the RenderScript runtime) for this function to initialize with the result of the +reduction. <i>resultType</i> is the type of that data item, which need not be the same +as <i>accumType</i>. <code>accum</code> is a pointer to the final accumulator data item +computed by the <a href="#combiner-function">combiner function</a>.</p> + +<p>If you do not provide an outconverter function, RenderScript copies the final accumulator +data item to the result data item, behaving as if there were an outconverter function that +looks like this:</p> + +<pre>static void <i>outconverterName</i>(<i>accumType</i> *result, const <i>accumType</i> *accum) { + *result = *accum; +}</pre> + +<p>If you want a different result type than the accumulator data type, then the outconverter function is mandatory.</p> +</li> + +</ul> + +<p>Note that a kernel has input types, an accumulator data item type, and a result type, + none of which need to be the same. For example, in + the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, the input + type <code>long</code>, accumulator data item type <code>MinAndMax</code>, and result + type <code>int2</code> are all different.</p> + +<h4 id="assume">What can't you assume?</h4> + +<p>You must not rely on the number of accumulator data items created by RenderScript for a + given kernel launch. There is no guarantee that two launches of the same kernel with the + same input(s) will create the same number of accumulator data items.</p> + +<p>You must not rely on the order in which RenderScript calls the initializer, accumulator, and + combiner functions; it may even call some of them in parallel. There is no guarantee that + two launches of the same kernel with the same input will follow the same order. The only + guarantee is that only the initializer function will ever see an uninitialized accumulator + data item. For example:</p> +<ul> +<li>There is no guarantee that all accumulator data items will be initialized before the + accumulator function is called, although it will only be called on an initialized accumulator + data item.</li> +<li>There is no guarantee on the order in which input Elements are passed to the accumulator + function.</li> +<li>There is no guarantee that the accumulator function has been called for all input Elements + before the combiner function is called.</li> +</ul> + +<p>One consequence of this is that the <a href="#example-findMinAndMax">findMinAndMax</a> + kernel is not deterministic: If the input contains more than one occurrence of the same + minimum or maximum value, you have no way of knowing which occurrence the kernel will + find.</p> + +<h4 id="guarantee">What must you guarantee?</h4> + +<p>Because the RenderScript system can choose to execute a kernel <a href="#assume">in many + different ways</a>, you must follow certain rules to ensure that your kernel behaves the + way you want. If you do not follow these rules, you may get incorrect results, + nondeterministic behavior, or runtime errors.</p> + +<p>The rules below often say that two accumulator data items must have "<a id="the-same">the + same value"</a>. What does this mean? That depends on what you want the kernel to do. For + a mathematical reduction such as <a href="#example-addint">addint</a>, it usually makes sense + for "the same" to mean mathematical equality. For a "pick any" search such + as <a href="#example-findMinAndMax">findMinAndMax</a> ("find the location of minimum and + maximum input values") where there might be more than one occurrence of identical input + values, all locations of a given input value must be considered "the same". You could write + a similar kernel to "find the location of <em>leftmost</em> minimum and maximum input values" + where (say) a minimum value at location 100 is preferred over an identical minimum value at location + 200; for this kernel, "the same" would mean identical <em>location</em>, not merely + identical <em>value</em>, and the accumulator and combiner functions would have to be + different than those for <a href="#example-findMinAndMax">findMinAndMax</a>.</p> + +<strong>The initializer function must create an <i>identity value</i>.</strong> That is, + if <code><i>I</i></code> and <code><i>A</i></code> are accumulator data items initialized + by the initializer function, and <code><i>I</i></code> has never been passed to the + accumulator function (but <code><i>A</i></code> may have been), then +<ul> +<li><code><i>combinerName</i>(&<i>A</i>, &<i>I</i>)</code> must + leave <code><i>A</i></code> <a href="#the-same">the same</a></li> +<li><code><i>combinerName</i>(&<i>I</i>, &<i>A</i>)</code> must + leave <code><i>I</i></code> <a href="#the-same">the same</a> as <code><i>A</i></code></li> +</ul> +<p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a> + kernel, an accumulator data item is initialized to zero. The combiner function for this + kernel performs addition; zero is the identity value for addition.</p> +<div class="note"> +<p><strong>Example:</strong> In the <a href="#example-findMinAndMax">findMinAndMax</a> + kernel, an accumulator data item is initialized + to <a href="#INITVAL"><code>INITVAL</code></a>. +<ul> +<li><code>fMMCombiner(&<i>A</i>, &<i>I</i>)</code> leaves <code><i>A</i></code> the same, + because <code><i>I</i></code> is <code>INITVAL</code>.</li> +<li><code>fMMCombiner(&<i>I</i>, &<i>A</i>)</code> sets <code><i>I</i></code> + to <code><i>A</i></code>, because <code><i>I</i></code> is <code>INITVAL</code>.</li> +</ul> +Therefore, <code>INITVAL</code> is indeed an identity value. +</p></div> + +<p><strong>The combiner function must be <i>commutative</i>.</strong> That is, + if <code><i>A</i></code> and <code><i>B</i></code> are accumulator data items initialized + by the initializer function, and that may have been passed to the accumulator function zero + or more times, then <code><i>combinerName</i>(&<i>A</i>, &<i>B</i>)</code> must + set <code><i>A</i></code> to <a href="#the-same">the same value</a> + that <code><i>combinerName</i>(&<i>B</i>, &<i>A</i>)</code> + sets <code><i>B</i></code>.</p> +<p class="note"><strong>Example:</strong> In the <a href="#example-addint">addint</a> + kernel, the combiner function adds the two accumulator data item values; addition is + commutative.</p> +<div class="note"> +<p><strong>Example:</strong> In the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, +<pre> +fMMCombiner(&<i>A</i>, &<i>B</i>) +</pre> +is the same as +<pre> +<i>A</i> = minmax(<i>A</i>, <i>B</i>) +</pre> +and <code>minmax</code> is commutative, so <code>fMMCombiner</code> is also. +</p> +</div> + +<p><strong>The combiner function must be <i>associative</i>.</strong> That is, + if <code><i>A</i></code>, <code><i>B</i></code>, and <code><i>C</i></code> are + accumulator data items initialized by the initializer function, and that may have been passed + to the accumulator function zero or more times, then the following two code sequences must + set <code><i>A</i></code> to <a href="#the-same">the same value</a>:</p> +<ul> +<li><pre> +<i>combinerName</i>(&<i>A</i>, &<i>B</i>); +<i>combinerName</i>(&<i>A</i>, &<i>C</i>); +</pre></li> +<li><pre> +<i>combinerName</i>(&<i>B</i>, &<i>C</i>); +<i>combinerName</i>(&<i>A</i>, &<i>B</i>); +</pre></li> +</ul> +<div class="note"> +<p><strong>Example:</strong> In the <a href="#example-addint">addint</a> kernel, the + combiner function adds the two accumulator data item values: +<ul> +<li><pre> +<i>A</i> = <i>A</i> + <i>B</i> +<i>A</i> = <i>A</i> + <i>C</i> +// Same as +// <i>A</i> = (<i>A</i> + <i>B</i>) + <i>C</i> +</pre></li> +<li><pre> +<i>B</i> = <i>B</i> + <i>C</i> +<i>A</i> = <i>A</i> + <i>B</i> +// Same as +// <i>A</i> = <i>A</i> + (<i>B</i> + <i>C</i>) +// <i>B</i> = <i>B</i> + <i>C</i> +</li> +</ul> +Addition is associative, and so the combiner function is also. +</p> +</div> +<div class="note"> +<p><strong>Example:</strong> In the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, +<pre> +fMMCombiner(&<i>A</i>, &<i>B</i>) +</pre> +is the same as +<pre> +<i>A</i> = minmax(<i>A</i>, <i>B</i>) +</pre> +So the two sequences are +<ul> +<li><pre> +<i>A</i> = minmax(<i>A</i>, <i>B</i>) +<i>A</i> = minmax(<i>A</i>, <i>C</i>) +// Same as +// <i>A</i> = minmax(minmax(<i>A</i>, <i>B</i>), <i>C</i>) +</pre></li> +<li><pre> +<i>B</i> = minmax(<i>B</i>, <i>C</i>) +<i>A</i> = minmax(<i>A</i>, <i>B</i>) +// Same as +// <i>A</i> = minmax(<i>A</i>, minmax(<i>B</i>, <i>C</i>)) +// <i>B</i> = minmax(<i>B</i>, <i>C</i>) +</pre></li> +<code>minmax</code> is associative, and so <code>fMMCombiner</code> is also. +</p> +</div> + +<p><strong>The accumulator function and combiner function together must obey the <i>basic + folding rule</i>.</strong> That is, if <code><i>A</i></code> + and <code><i>B</i></code> are accumulator data items, <code><i>A</i></code> has been + initialized by the initializer function and may have been passed to the accumulator function + zero or more times, <code><i>B</i></code> has not been initialized, and <i>args</i> is + the list of input arguments and special arguments for a particular call to the accumulator + function, then the following two code sequences must set <code><i>A</i></code> + to <a href="#the-same">the same value</a>:</p> +<ul> +<li><pre> +<i>accumulatorName</i>(&<i>A</i>, <i>args</i>); // statement 1 +</pre></li> +<li><pre> +<i>initializerName</i>(&<i>B</i>); // statement 2 +<i>accumulatorName</i>(&<i>B</i>, <i>args</i>); // statement 3 +<i>combinerName</i>(&<i>A</i>, &<i>B</i>); // statement 4 +</pre></li> +</ul> +<div class="note"> +<p><strong>Example:</strong> In the <a href="#example-addint">addint</a> kernel, for an input value <i>V</i>: +<ul> +<li>Statement 1 is the same as <code>A += <i>V</i></code></li> +<li>Statement 2 is the same as <code>B = 0</code></li> +<li>Statement 3 is the same as <code>B += <i>V</i></code>, which is the same as <code>B = <i>V</i></code></li> +<li>Statement 4 is the same as <code>A += B</code>, which is the same as <code>A += <i>V</i></code></li> +</ul> +Statements 1 and 4 set <code><i>A</i></code> to the same value, and so this kernel obeys the +basic folding rule. +</p> +</div> +<div class="note"> +<p><strong>Example:</strong> In the <a href="#example-findMinAndMax">findMinAndMax</a> kernel, for an input + value <i>V</i> at coordinate <i>X</i>: +<ul> +<li>Statement 1 is the same as <code>A = minmax(A, IndexedVal(<i>V</i>, <i>X</i>))</code></li> +<li>Statement 2 is the same as <code>B = <a href="#INITVAL">INITVAL</a></code></li> +<li>Statement 3 is the same as +<pre> +B = minmax(B, IndexedVal(<i>V</i>, <i>X</i>)) +</pre> +which, because <i>B</i> is the initial value, is the same as +<pre> +B = IndexedVal(<i>V</i>, <i>X</i>) +</pre> +</li> +<li>Statement 4 is the same as +<pre> +A = minmax(A, B) +</pre> +which is the same as +<pre> +A = minmax(A, IndexedVal(<i>V</i>, <i>X</i>)) +</pre> +</ul> +Statements 1 and 4 set <code><i>A</i></code> to the same value, and so this kernel obeys the +basic folding rule. +</p> +</div> + +<h3 id="calling-reduction-kernel">Calling a reduction kernel from Java code</h3> + +<p>For a reduction kernel named <i>kernelName</i> defined in the +file <code><i>filename</i>.rs</code>, there are three methods reflected in the +class <code>ScriptC_<i>filename</i></code>:</p> + +<pre> +// Method 1 +public <i>javaFutureType</i> reduce_<i>kernelName</i>(Allocation ain1, <i>…,</i> + Allocation ain<i>N</i>); + +// Method 2 +public <i>javaFutureType</i> reduce_<i>kernelName</i>(Allocation ain1, <i>…,</i> + Allocation ain<i>N</i>, + Script.LaunchOptions sc); + +// Method 3 +public <i>javaFutureType</i> reduce_<i>kernelName</i>(<i><a href="#devec">devecSiIn1Type</a></i>[] in1, …, + <i><a href="#devec">devecSiInNType</a></i>[] in<i>N</i>); +</pre> + +<p>Here are some examples of calling the <a href="#example-addint">addint</a> kernel:</p> +<pre> +ScriptC_example script = new ScriptC_example(mRenderScript); + +// 1D array +// and obtain answer immediately +int input1[] = <i>…</i>; +int sum1 = script.reduce_addint(input1).get(); // Method 3 + +// 2D allocation +// and do some additional work before obtaining answer +Type.Builder typeBuilder = + new Type.Builder(RS, Element.I32(RS)); +typeBuilder.setX(<i>…</i>); +typeBuilder.setY(<i>…</i>); +Allocation input2 = createTyped(RS, typeBuilder.create()); +<i>populateSomehow</i>(input2); // fill in input Allocation with data +script.result_int result2 = script.reduce_addint(input2); // Method 1 +<i>doSomeAdditionalWork</i>(); // might run at same time as reduction +int sum2 = result2.get(); +</pre> + +<p><strong>Method 1</strong> has one input {@link android.renderscript.Allocation} argument for + every input argument in the kernel's <a href="#accumulator-function">accumulator + function</a>. The RenderScript runtime checks to ensure that all of the input Allocations + have the same dimensions and that the {@link android.renderscript.Element} type of each of + the input Allocations matches that of the corresponding input argument of the accumulator + function's prototype. If any of these checks fail, RenderScript throws an exception. The + kernel executes over every coordinate in those dimensions.</p> + +<p><strong>Method 2</strong> is the same as Method 1 except that Method 2 takes an additional + argument <code>sc</code> that can be used to limit the kernel execution to a subset of the + coordinates.</p> + +<p><strong><a id="reduce-method-3">Method 3</a></strong> is the same as Method 1 except that + instead of taking Allocation inputs it takes Java array inputs. This is a convenience that + saves you from having to write code to explicitly create an Allocation and copy data to it + from a Java array. <em>However, using Method 3 instead of Method 1 does not increase the + performance of the code</em>. For each input array, Method 3 creates a temporary + 1-dimensional Allocation with the appropriate {@link android.renderscript.Element} type and + {@link android.renderscript.Allocation#setAutoPadding} enabled, and copies the array to the + Allocation as if by the appropriate <code>copyFrom()</code> method of {@link + android.renderscript.Allocation}. It then calls Method 1, passing those temporary + Allocations.</p> +<p class="note"><strong>NOTE:</strong> If your application will make multiple kernel calls with + the same array, or with different arrays of the same dimensions and Element type, you may improve + performance by explicitly creating, populating, and reusing Allocations yourself, instead of + by using Method 3.</p> +<p><strong><i><a id="javaFutureType">javaFutureType</a></i></strong>, + the return type of the reflected reduction methods, is a reflected + static nested class within the <code>ScriptC_<i>filename</i></code> + class. It represents the future result of a reduction + kernel run. To obtain the actual result of the run, call + the <code>get()</code> method of that class, which returns a value + of type <i>javaResultType</i>. <code>get()</code> is <a href="#asynchronous-model">synchronous</a>.</p> + +<pre> +public class ScriptC_<i>filename</i> extends ScriptC { + public static class <i>javaFutureType</i> { + public <i>javaResultType</i> get() { … } + } +} +</pre> + +<p><strong><i>javaResultType</i></strong> is determined from the <i>resultType</i> of the + <a href="#outconverter-function">outconverter function</a>. Unless <i>resultType</i> is an + unsigned type (scalar, vector, or array), <i>javaResultType</i> is the directly corresponding + Java type. If <i>resultType</i> is an unsigned type and there is a larger Java signed type, + then <i>javaResultType</i> is that larger Java signed type; otherwise, it is the directly + corresponding Java type. For example:</p> +<ul> +<li>If <i>resultType</i> is <code>int</code>, <code>int2</code>, or <code>int[15]</code>, + then <i>javaResultType</i> is <code>int</code>, <code>Int2</code>, + or <code>int[]</code>. All values of <i>resultType</i> can be represented + by <i>javaResultType</i>.</li> +<li>If <i>resultType</i> is <code>uint</code>, <code>uint2</code>, or <code>uint[15]</code>, + then <i>javaResultType</i> is <code>long</code>, <code>Long2</code>, + or <code>long[]</code>. All values of <i>resultType</i> can be represented + by <i>javaResultType</i>.</li> +<li>If <i>resultType</i> is <code>ulong</code>, <code>ulong2</code>, + or <code>ulong[15]</code>, then <i>javaResultType</i> + is <code>long</code>, <code>Long2</code>, or <code>long[]</code>. There are certain values + of <i>resultType</i> that cannot be represented by <i>javaResultType</i>.</li> +</ul> + +<p><strong><i>javaFutureType</i></strong> is the future result type corresponding + to the <i>resultType</i> of the <a href="#outconverter-function">outconverter + function</a>.</p> +<ul> +<li>If <i>resultType</i> is not an array type, then <i>javaFutureType</i> + is <code>result_<i>resultType</i></code>.</li> +<li>If <i>resultType</i> is an array of length <i>Count</i> with members of type <i>memberType</i>, + then <i>javaFutureType</i> is <code>resultArray<i>Count</i>_<i>memberType</i></code>.</li> +</ul> + +<p>For example:</p> + +<pre> +public class ScriptC_<i>filename</i> extends ScriptC { + // for kernels with int result + public static class result_int { + public int get() { … } + } + + // for kernels with int[10] result + public static class resultArray10_int { + public int[] get() { … } + } + + // for kernels with int2 result + // note that the Java type name "Int2" is not the same as the script type name "int2" + public static class result_int2 { + public Int2 get() { … } + } + + // for kernels with int2[10] result + // note that the Java type name "Int2" is not the same as the script type name "int2" + public static class resultArray10_int2 { + public Int2[] get() { … } + } + + // for kernels with uint result + // note that the Java type "long" is a wider signed type than the unsigned script type "uint" + public static class result_uint { + public long get() { … } + } + + // for kernels with uint[10] result + // note that the Java type "long" is a wider signed type than the unsigned script type "uint" + public static class resultArray10_uint { + public long[] get() { … } + } + + // for kernels with uint2 result + // note that the Java type "Long2" is a wider signed type than the unsigned script type "uint2" + public static class result_uint2 { + public Long2 get() { … } + } + + // for kernels with uint2[10] result + // note that the Java type "Long2" is a wider signed type than the unsigned script type "uint2" + public static class resultArray10_uint2 { + public Long2[] get() { … } + } +} +</pre> + +<p>If <i>javaResultType</i> is an object type (including an array type), each call + to <code><i>javaFutureType</i>.get()</code> on the same instance will return the same + object.</p> + +<p>If <i>javaResultType</i> cannot represent all values of type <i>resultType</i>, and a + reduction kernel produces an unrepresentible value, + then <code><i>javaFutureType</i>.get()</code> throws an exception.</p> + +<h4 id="devec">Method 3 and <i>devecSiInXType</i></h4> + +<p><strong><i>devecSiInXType</i></strong> is the Java type corresponding to + the <i>inXType</i> of the corresponding argument of + the <a href="#accumulator-function">accumulator function</a>. Unless <i>inXType</i> is an + unsigned type or a vector type, <i>devecSiInXType</i> is the directly corresponding Java + type. If <i>inXType</i> is an unsigned scalar type, then <i>devecSiInXType</i> is the + Java type directly corresponding to the signed scalar type of the same + size. If <i>inXType</i> is a signed vector type, then <i>devecSiInXType</i> is the Java + type directly corresponding to the vector component type. If <i>inXType</i> is an unsigned + vector type, then <i>devecSiInXType</i> is the Java type directly corresponding to the + signed scalar type of the same size as the vector component type. For example:</p> +<ul> +<li>If <i>inXType</i> is <code>int</code>, then <i>devecSiInXType</i> + is <code>int</code>.</li> +<li>If <i>inXType</i> is <code>int2</code>, then <i>devecSiInXType</i> + is <code>int</code>. The array is a <em>flattened</em> representation: It has twice as + many <em>scalar</em> Elements as the Allocation has 2-component <em>vector</em> + Elements. This is the same way that the <code>copyFrom()</code> methods of {@link + android.renderscript.Allocation} work.</li> +<li>If <i>inXType</i> is <code>uint</code>, then <i>deviceSiInXType</i> + is <code>int</code>. A signed value in the Java array is interpreted as an unsigned value of + the same bitpattern in the Allocation. This is the same way that the <code>copyFrom()</code> + methods of {@link android.renderscript.Allocation} work.</li> +<li>If <i>inXType</i> is <code>uint2</code>, then <i>deviceSiInXType</i> + is <code>int</code>. This is a combination of the way <code>int2</code> and <code>uint</code> + are handled: The array is a flattened representation, and Java array signed values are + interpreted as RenderScript unsigned Element values.</li> +</ul> + +<p>Note that for <a href="#reduce-method-3">Method 3</a>, input types are handled differently +than result types:</p> + +<ul> +<li>A script's vector input is flattened on the Java side, whereas a script's vector result is not.</li> +<li>A script's unsigned input is represented as a signed input of the same size on the Java + side, whereas a script's unsigned result is represented as a widened signed type on the Java + side (except in the case of <code>ulong</code>).</li> +</ul> + +<h3 id="more-example">More example reduction kernels</h3> + +<pre id="dot-product"> +#pragma rs reduce(dotProduct) \ + accumulator(dotProductAccum) combiner(dotProductSum) + +// Note: No initializer function -- therefore, +// each accumulator data item is implicitly initialized to 0.0f. + +static void dotProductAccum(float *accum, float in1, float in2) { + *accum += in1*in2; +} + +// combiner function +static void dotProductSum(float *accum, const float *val) { + *accum += *val; +} +</pre> + +<pre> +// Find a zero Element in a 2D allocation; return (-1, -1) if none +#pragma rs reduce(fz2) \ + initializer(fz2Init) \ + accumulator(fz2Accum) combiner(fz2Combine) + +static void fz2Init(int2 *accum) { accum->x = accum->y = -1; } + +static void fz2Accum(int2 *accum, + int inVal, + int x /* special arg */, + int y /* special arg */) { + if (inVal==0) { + accum->x = x; + accum->y = y; + } +} + +static void fz2Combine(int2 *accum, const int2 *accum2) { + if (accum2->x >= 0) *accum = *accum2; +} +</pre> + +<pre> +// Note that this kernel returns an array to Java +#pragma rs reduce(histogram) \ + accumulator(hsgAccum) combiner(hsgCombine) + +#define BUCKETS 256 +typedef uint32_t Histogram[BUCKETS]; + +// Note: No initializer function -- +// therefore, each bucket is implicitly initialized to 0. + +static void hsgAccum(Histogram *h, uchar in) { ++(*h)[in]; } + +static void hsgCombine(Histogram *accum, + const Histogram *addend) { + for (int i = 0; i < BUCKETS; ++i) + (*accum)[i] += (*addend)[i]; +} + +// Determines the mode (most frequently occurring value), and returns +// the value and the frequency. +// +// If multiple values have the same highest frequency, returns the lowest +// of those values. +// +// Shares functions with the histogram reduction kernel. +#pragma rs reduce(mode) \ + accumulator(hsgAccum) combiner(hsgCombine) \ + outconverter(modeOutConvert) + +static void modeOutConvert(int2 *result, const Histogram *h) { + uint32_t mode = 0; + for (int i = 1; i < BUCKETS; ++i) + if ((*h)[i] > (*h)[mode]) mode = i; + result->x = mode; + result->y = (*h)[mode]; +} +</pre> |