1 files changed, 358 insertions, 0 deletions
diff --git a/share/doc/gcc/Vector-Extensions.html b/share/doc/gcc/Vector-Extensions.html
new file mode 100644
index 0000000..5cfb03e
--- /dev/null
+++ b/share/doc/gcc/Vector-Extensions.html
@@ -0,0 +1,358 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<!-- This file documents the use of the GNU compilers.
+
+Copyright (C) 1988-2023 Free Software Foundation, Inc.
+
+Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.3 or
+any later version published by the Free Software Foundation; with the
+Invariant Sections being "Funding Free Software", the Front-Cover
+Texts being (a) (see below), and with the Back-Cover Texts being (b)
+(see below).  A copy of the license is included in the section entitled
+"GNU Free Documentation License".
+
+(a) The FSF's Front-Cover Text is:
+
+A GNU Manual
+
+(b) The FSF's Back-Cover Text is:
+
+You have freedom to copy and modify this GNU Manual, like GNU
+     software.  Copies published by the Free Software Foundation raise
+     funds for GNU development. -->
+<!-- Created by GNU Texinfo 5.1, http://www.gnu.org/software/texinfo/ -->
+<head>
+<title>Using the GNU Compiler Collection (GCC): Vector Extensions</title>
+
+<meta name="description" content="Using the GNU Compiler Collection (GCC): Vector Extensions">
+<meta name="keywords" content="Using the GNU Compiler Collection (GCC): Vector Extensions">
+<meta name="resource-type" content="document">
+<meta name="distribution" content="global">
+<meta name="Generator" content="makeinfo">
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+<link href="index.html#Top" rel="start" title="Top">
+<link href="Indices.html#Indices" rel="index" title="Indices">
+<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
+<link href="C-Extensions.html#C-Extensions" rel="up" title="C Extensions">
+<link href="Offsetof.html#Offsetof" rel="next" title="Offsetof">
+<link href="Return-Address.html#Return-Address" rel="previous" title="Return Address">
+<style type="text/css">
+<!--
+a.summary-letter {text-decoration: none}
+blockquote.smallquotation {font-size: smaller}
+div.display {margin-left: 3.2em}
+div.example {margin-left: 3.2em}
+div.indentedblock {margin-left: 3.2em}
+div.lisp {margin-left: 3.2em}
+div.smalldisplay {margin-left: 3.2em}
+div.smallexample {margin-left: 3.2em}
+div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
+div.smalllisp {margin-left: 3.2em}
+kbd {font-style:oblique}
+pre.display {font-family: inherit}
+pre.format {font-family: inherit}
+pre.menu-comment {font-family: serif}
+pre.menu-preformatted {font-family: serif}
+pre.smalldisplay {font-family: inherit; font-size: smaller}
+pre.smallexample {font-size: smaller}
+pre.smallformat {font-family: inherit; font-size: smaller}
+pre.smalllisp {font-size: smaller}
+span.nocodebreak {white-space:nowrap}
+span.nolinebreak {white-space:nowrap}
+span.roman {font-family:serif; font-weight:normal}
+span.sansserif {font-family:sans-serif; font-weight:normal}
+ul.no-bullet {list-style: none}
+-->
+</style>
+
+
+</head>
+
+<body lang="en_US" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
+<a name="Vector-Extensions"></a>
+<div class="header">
+<p>
+Next: <a href="Offsetof.html#Offsetof" accesskey="n" rel="next">Offsetof</a>, Previous: <a href="Return-Address.html#Return-Address" accesskey="p" rel="previous">Return Address</a>, Up: <a href="C-Extensions.html#C-Extensions" accesskey="u" rel="up">C Extensions</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Indices.html#Indices" title="Index" rel="index">Index</a>]</p>
+</div>
+<hr>
+<a name="Using-Vector-Instructions-through-Built_002din-Functions"></a>
+<h3 class="section">6.52 Using Vector Instructions through Built-in Functions</h3>
+
+<p>On some targets, the instruction set contains SIMD vector instructions which
+operate on multiple values contained in one large register at the same time.
+For example, on the x86 the MMX, 3DNow! and SSE extensions can be used
+this way.
+</p>
+<p>The first step in using these extensions is to provide the necessary data
+types.  This should be done using an appropriate <code>typedef</code>:
+</p>
+<div class="smallexample">
+<pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
+</pre></div>
+
+<p>The <code>int</code> type specifies the <em>base type</em>, while the attribute specifies
+the vector size for the variable, measured in bytes.  For example, the
+declaration above causes the compiler to set the mode for the <code>v4si</code>
+type to be 16 bytes wide and divided into <code>int</code> sized units.  For
+a 32-bit <code>int</code> this means a vector of 4 units of 4 bytes, and the
+corresponding mode of <code>foo</code> is <acronym>V4SI</acronym>.
+</p>
+<p>The <code>vector_size</code> attribute is only applicable to integral and
+floating scalars, although arrays, pointers, and function return values
+are allowed in conjunction with this construct. Only sizes that are
+positive power-of-two multiples of the base type size are currently allowed.
+</p>
+<p>All the basic integer types can be used as base types, both as signed
+and as unsigned: <code>char</code>, <code>short</code>, <code>int</code>, <code>long</code>,
+<code>long long</code>.  In addition, <code>float</code> and <code>double</code> can be
+used to build floating-point vector types.
+</p>
+<p>Specifying a combination that is not valid for the current architecture
+causes GCC to synthesize the instructions using a narrower mode.
+For example, if you specify a variable of type <code>V4SI</code> and your
+architecture does not allow for this specific SIMD type, GCC
+produces code that uses 4 <code>SIs</code>.
+</p>
+<p>The types defined in this manner can be used with a subset of normal C
+operations.  Currently, GCC allows using the following operators
+on these types: <code>+, -, *, /, unary minus, ^, |, &amp;, ~, %</code>.
+</p>
+<p>The operations behave like C++ <code>valarrays</code>.  Addition is defined as
+the addition of the corresponding elements of the operands.  For
+example, in the code below, each of the 4 elements in <var>a</var> is
+added to the corresponding 4 elements in <var>b</var> and the resulting
+vector is stored in <var>c</var>.
+</p>
+<div class="smallexample">
+<pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a, b, c;
+
+c = a + b;
+</pre></div>
+
+<p>Subtraction, multiplication, division, and the logical operations
+operate in a similar manner.  Likewise, the result of using the unary
+minus or complement operators on a vector type is a vector whose
+elements are the negative or complemented values of the corresponding
+elements in the operand.
+</p>
+<p>It is possible to use shifting operators <code>&lt;&lt;</code>, <code>&gt;&gt;</code> on
+integer-type vectors. The operation is defined as following: <code>{a0,
+a1, &hellip;, an} &gt;&gt; {b0, b1, &hellip;, bn} == {a0 &gt;&gt; b0, a1 &gt;&gt; b1,
+&hellip;, an &gt;&gt; bn}</code>. Vector operands must have the same number of
+elements. 
+</p>
+<p>For convenience, it is allowed to use a binary vector operation
+where one operand is a scalar. In that case the compiler transforms
+the scalar operand into a vector where each element is the scalar from
+the operation. The transformation happens only if the scalar could be
+safely converted to the vector-element type.
+Consider the following code.
+</p>
+<div class="smallexample">
+<pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a, b, c;
+long l;
+
+a = b + 1;    /* a = b + {1,1,1,1}; */
+a = 2 * b;    /* a = {2,2,2,2} * b; */
+
+a = l + a;    /* Error, cannot convert long to int. */
+</pre></div>
+
+<p>Vectors can be subscripted as if the vector were an array with
+the same number of elements and base type.  Out of bound accesses
+invoke undefined behavior at run time.  Warnings for out of bound
+accesses for vector subscription can be enabled with
+<samp>-Warray-bounds</samp>.
+</p>
+<p>Vector comparison is supported with standard comparison
+operators: <code>==, !=, &lt;, &lt;=, &gt;, &gt;=</code>. Comparison operands can be
+vector expressions of integer-type or real-type. Comparison between
+integer-type vectors and real-type vectors are not supported.  The
+result of the comparison is a vector of the same width and number of
+elements as the comparison operands with a signed integral element
+type.
+</p>
+<p>Vectors are compared element-wise producing 0 when comparison is false
+and -1 (constant of the appropriate type where all bits are set)
+otherwise. Consider the following example.
+</p>
+<div class="smallexample">
+<pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = {1,2,3,4};
+v4si b = {3,2,1,4};
+v4si c;
+
+c = a &gt;  b;     /* The result would be {0, 0,-1, 0}  */
+c = a == b;     /* The result would be {0,-1, 0,-1}  */
+</pre></div>
+
+<p>In C++, the ternary operator <code>?:</code> is available. <code>a?b:c</code>, where
+<code>b</code> and <code>c</code> are vectors of the same type and <code>a</code> is an
+integer vector with the same number of elements of the same size as <code>b</code>
+and <code>c</code>, computes all three arguments and creates a vector
+<code>{a[0]?b[0]:c[0], a[1]?b[1]:c[1], &hellip;}</code>.  Note that unlike in
+OpenCL, <code>a</code> is thus interpreted as <code>a != 0</code> and not <code>a &lt; 0</code>.
+As in the case of binary operations, this syntax is also accepted when
+one of <code>b</code> or <code>c</code> is a scalar that is then transformed into a
+vector. If both <code>b</code> and <code>c</code> are scalars and the type of
+<code>true?b:c</code> has the same size as the element type of <code>a</code>, then
+<code>b</code> and <code>c</code> are converted to a vector type whose elements have
+this type and with the same number of elements as <code>a</code>.
+</p>
+<p>In C++, the logic operators <code>!, &amp;&amp;, ||</code> are available for vectors.
+<code>!v</code> is equivalent to <code>v == 0</code>, <code>a &amp;&amp; b</code> is equivalent to
+<code>a!=0 &amp; b!=0</code> and <code>a || b</code> is equivalent to <code>a!=0 | b!=0</code>.
+For mixed operations between a scalar <code>s</code> and a vector <code>v</code>,
+<code>s &amp;&amp; v</code> is equivalent to <code>s?v!=0:0</code> (the evaluation is
+short-circuit) and <code>v &amp;&amp; s</code> is equivalent to <code>v!=0 &amp; (s?-1:0)</code>.
+</p>
+<a name="index-_005f_005fbuiltin_005fshuffle"></a>
+<p>Vector shuffling is available using functions
+<code>__builtin_shuffle (vec, mask)</code> and
+<code>__builtin_shuffle (vec0, vec1, mask)</code>.
+Both functions construct a permutation of elements from one or two
+vectors and return a vector of the same type as the input vector(s).
+The <var>mask</var> is an integral vector with the same width (<var>W</var>)
+and element count (<var>N</var>) as the output vector.
+</p>
+<p>The elements of the input vectors are numbered in memory ordering of
+<var>vec0</var> beginning at 0 and <var>vec1</var> beginning at <var>N</var>.  The
+elements of <var>mask</var> are considered modulo <var>N</var> in the single-operand
+case and modulo <em>2*<var>N</var></em> in the two-operand case.
+</p>
+<p>Consider the following example,
+</p>
+<div class="smallexample">
+<pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = {1,2,3,4};
+v4si b = {5,6,7,8};
+v4si mask1 = {0,1,1,3};
+v4si mask2 = {0,4,2,5};
+v4si res;
+
+res = __builtin_shuffle (a, mask1);       /* res is {1,2,2,4}  */
+res = __builtin_shuffle (a, b, mask2);    /* res is {1,5,3,6}  */
+</pre></div>
+
+<p>Note that <code>__builtin_shuffle</code> is intentionally semantically
+compatible with the OpenCL <code>shuffle</code> and <code>shuffle2</code> functions.
+</p>
+<p>You can declare variables and use them in function calls and returns, as
+well as in assignments and some casts.  You can specify a vector type as
+a return type for a function.  Vector types can also be used as function
+arguments.  It is possible to cast from one vector type to another,
+provided they are of the same size (in fact, you can also cast vectors
+to and from other datatypes of the same size).
+</p>
+<p>You cannot operate between vectors of different lengths or different
+signedness without a cast.
+</p>
+<a name="index-_005f_005fbuiltin_005fshufflevector"></a>
+<p>Vector shuffling is available using the
+<code>__builtin_shufflevector (vec1, vec2, index...)</code>
+function.  <var>vec1</var> and <var>vec2</var> must be expressions with
+vector type with a compatible element type.  The result of
+<code>__builtin_shufflevector</code> is a vector with the same element type
+as <var>vec1</var> and <var>vec2</var> but that has an element count equal to
+the number of indices specified.
+</p>
+<p>The <var>index</var> arguments are a list of integers that specify the
+elements indices of the first two vectors that should be extracted and
+returned in a new vector. These element indices are numbered sequentially
+starting with the first vector, continuing into the second vector.
+An index of -1 can be used to indicate that the corresponding element in
+the returned vector is a don&rsquo;t care and can be freely chosen to optimized
+the generated code sequence performing the shuffle operation.
+</p>
+<p>Consider the following example,
+</p><div class="smallexample">
+<pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
+typedef int v8si __attribute__ ((vector_size (32)));
+
+v8si a = {1,-2,3,-4,5,-6,7,-8};
+v4si b = __builtin_shufflevector (a, a, 0, 2, 4, 6); /* b is {1,3,5,7} */
+v4si c = {-2,-4,-6,-8};
+v8si d = __builtin_shufflevector (c, b, 4, 0, 5, 1, 6, 2, 7, 3); /* d is a */
+</pre></div>
+
+<a name="index-_005f_005fbuiltin_005fconvertvector"></a>
+<p>Vector conversion is available using the
+<code>__builtin_convertvector (vec, vectype)</code>
+function.  <var>vec</var> must be an expression with integral or floating
+vector type and <var>vectype</var> an integral or floating vector type with the
+same number of elements.  The result has <var>vectype</var> type and value of
+a C cast of every element of <var>vec</var> to the element type of <var>vectype</var>.
+</p>
+<p>Consider the following example,
+</p><div class="smallexample">
+<pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
+typedef float v4sf __attribute__ ((vector_size (16)));
+typedef double v4df __attribute__ ((vector_size (32)));
+typedef unsigned long long v4di __attribute__ ((vector_size (32)));
+
+v4si a = {1,-2,3,-4};
+v4sf b = {1.5f,-2.5f,3.f,7.f};
+v4di c = {1ULL,5ULL,0ULL,10ULL};
+v4sf d = __builtin_convertvector (a, v4sf); /* d is {1.f,-2.f,3.f,-4.f} */
+/* Equivalent of:
+   v4sf d = { (float)a[0], (float)a[1], (float)a[2], (float)a[3] }; */
+v4df e = __builtin_convertvector (a, v4df); /* e is {1.,-2.,3.,-4.} */
+v4df f = __builtin_convertvector (b, v4df); /* f is {1.5,-2.5,3.,7.} */
+v4si g = __builtin_convertvector (f, v4si); /* g is {1,-2,3,7} */
+v4si h = __builtin_convertvector (c, v4si); /* h is {1,5,0,10} */
+</pre></div>
+
+<a name="index-vector-types_002c-using-with-x86-intrinsics"></a>
+<p>Sometimes it is desirable to write code using a mix of generic vector
+operations (for clarity) and machine-specific vector intrinsics (to
+access vector instructions that are not exposed via generic built-ins).
+On x86, intrinsic functions for integer vectors typically use the same
+vector type <code>__m128i</code> irrespective of how they interpret the vector,
+making it necessary to cast their arguments and return values from/to
+other vector types.  In C, you can make use of a <code>union</code> type:
+</p><div class="smallexample">
+<pre class="smallexample">#include &lt;immintrin.h&gt;
+
+typedef unsigned char u8x16 __attribute__ ((vector_size (16)));
+typedef unsigned int  u32x4 __attribute__ ((vector_size (16)));
+
+typedef union {
+        __m128i mm;
+        u8x16   u8;
+        u32x4   u32;
+} v128;
+</pre></div>
+
+<p>for variables that can be used with both built-in operators and x86
+intrinsics:
+</p>
+<div class="smallexample">
+<pre class="smallexample">v128 x, y = { 0 };
+memcpy (&amp;x, ptr, sizeof x);
+y.u8  += 0x80;
+x.mm  = _mm_adds_epu8 (x.mm, y.mm);
+x.u32 &amp;= 0xffffff;
+
+/* Instead of a variable, a compound literal may be used to pass the
+   return value of an intrinsic call to a function expecting the union: */
+v128 foo (v128);
+x = foo ((v128) {_mm_adds_epu8 (x.mm, y.mm)});
+</pre></div>
+
+<hr>
+<div class="header">
+<p>
+Next: <a href="Offsetof.html#Offsetof" accesskey="n" rel="next">Offsetof</a>, Previous: <a href="Return-Address.html#Return-Address" accesskey="p" rel="previous">Return Address</a>, Up: <a href="C-Extensions.html#C-Extensions" accesskey="u" rel="up">C Extensions</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Indices.html#Indices" title="Index" rel="index">Index</a>]</p>
+</div>
+
+
+
+</body>
+</html>