1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- This file documents the use of the GNU compilers.
Copyright (C) 1988-2023 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "Funding Free Software", the Front-Cover
Texts being (a) (see below), and with the Back-Cover Texts being (b)
(see below). A copy of the license is included in the section entitled
"GNU Free Documentation License".
(a) The FSF's Front-Cover Text is:
A GNU Manual
(b) The FSF's Back-Cover Text is:
You have freedom to copy and modify this GNU Manual, like GNU
software. Copies published by the Free Software Foundation raise
funds for GNU development. -->
<!-- Created by GNU Texinfo 5.1, http://www.gnu.org/software/texinfo/ -->
<head>
<title>Using the GNU Compiler Collection (GCC): Vector Extensions</title>
<meta name="description" content="Using the GNU Compiler Collection (GCC): Vector Extensions">
<meta name="keywords" content="Using the GNU Compiler Collection (GCC): Vector Extensions">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link href="index.html#Top" rel="start" title="Top">
<link href="Indices.html#Indices" rel="index" title="Indices">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="C-Extensions.html#C-Extensions" rel="up" title="C Extensions">
<link href="Offsetof.html#Offsetof" rel="next" title="Offsetof">
<link href="Return-Address.html#Return-Address" rel="previous" title="Return Address">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.smallquotation {font-size: smaller}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
div.indentedblock {margin-left: 3.2em}
div.lisp {margin-left: 3.2em}
div.smalldisplay {margin-left: 3.2em}
div.smallexample {margin-left: 3.2em}
div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
div.smalllisp {margin-left: 3.2em}
kbd {font-style:oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: inherit; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: inherit; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.nocodebreak {white-space:nowrap}
span.nolinebreak {white-space:nowrap}
span.roman {font-family:serif; font-weight:normal}
span.sansserif {font-family:sans-serif; font-weight:normal}
ul.no-bullet {list-style: none}
-->
</style>
</head>
<body lang="en_US" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
<a name="Vector-Extensions"></a>
<div class="header">
<p>
Next: <a href="Offsetof.html#Offsetof" accesskey="n" rel="next">Offsetof</a>, Previous: <a href="Return-Address.html#Return-Address" accesskey="p" rel="previous">Return Address</a>, Up: <a href="C-Extensions.html#C-Extensions" accesskey="u" rel="up">C Extensions</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Indices.html#Indices" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<a name="Using-Vector-Instructions-through-Built_002din-Functions"></a>
<h3 class="section">6.52 Using Vector Instructions through Built-in Functions</h3>
<p>On some targets, the instruction set contains SIMD vector instructions which
operate on multiple values contained in one large register at the same time.
For example, on the x86 the MMX, 3DNow! and SSE extensions can be used
this way.
</p>
<p>The first step in using these extensions is to provide the necessary data
types. This should be done using an appropriate <code>typedef</code>:
</p>
<div class="smallexample">
<pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
</pre></div>
<p>The <code>int</code> type specifies the <em>base type</em>, while the attribute specifies
the vector size for the variable, measured in bytes. For example, the
declaration above causes the compiler to set the mode for the <code>v4si</code>
type to be 16 bytes wide and divided into <code>int</code> sized units. For
a 32-bit <code>int</code> this means a vector of 4 units of 4 bytes, and the
corresponding mode of <code>foo</code> is <acronym>V4SI</acronym>.
</p>
<p>The <code>vector_size</code> attribute is only applicable to integral and
floating scalars, although arrays, pointers, and function return values
are allowed in conjunction with this construct. Only sizes that are
positive power-of-two multiples of the base type size are currently allowed.
</p>
<p>All the basic integer types can be used as base types, both as signed
and as unsigned: <code>char</code>, <code>short</code>, <code>int</code>, <code>long</code>,
<code>long long</code>. In addition, <code>float</code> and <code>double</code> can be
used to build floating-point vector types.
</p>
<p>Specifying a combination that is not valid for the current architecture
causes GCC to synthesize the instructions using a narrower mode.
For example, if you specify a variable of type <code>V4SI</code> and your
architecture does not allow for this specific SIMD type, GCC
produces code that uses 4 <code>SIs</code>.
</p>
<p>The types defined in this manner can be used with a subset of normal C
operations. Currently, GCC allows using the following operators
on these types: <code>+, -, *, /, unary minus, ^, |, &, ~, %</code>.
</p>
<p>The operations behave like C++ <code>valarrays</code>. Addition is defined as
the addition of the corresponding elements of the operands. For
example, in the code below, each of the 4 elements in <var>a</var> is
added to the corresponding 4 elements in <var>b</var> and the resulting
vector is stored in <var>c</var>.
</p>
<div class="smallexample">
<pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
v4si a, b, c;
c = a + b;
</pre></div>
<p>Subtraction, multiplication, division, and the logical operations
operate in a similar manner. Likewise, the result of using the unary
minus or complement operators on a vector type is a vector whose
elements are the negative or complemented values of the corresponding
elements in the operand.
</p>
<p>It is possible to use shifting operators <code><<</code>, <code>>></code> on
integer-type vectors. The operation is defined as following: <code>{a0,
a1, …, an} >> {b0, b1, …, bn} == {a0 >> b0, a1 >> b1,
…, an >> bn}</code>. Vector operands must have the same number of
elements.
</p>
<p>For convenience, it is allowed to use a binary vector operation
where one operand is a scalar. In that case the compiler transforms
the scalar operand into a vector where each element is the scalar from
the operation. The transformation happens only if the scalar could be
safely converted to the vector-element type.
Consider the following code.
</p>
<div class="smallexample">
<pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
v4si a, b, c;
long l;
a = b + 1; /* a = b + {1,1,1,1}; */
a = 2 * b; /* a = {2,2,2,2} * b; */
a = l + a; /* Error, cannot convert long to int. */
</pre></div>
<p>Vectors can be subscripted as if the vector were an array with
the same number of elements and base type. Out of bound accesses
invoke undefined behavior at run time. Warnings for out of bound
accesses for vector subscription can be enabled with
<samp>-Warray-bounds</samp>.
</p>
<p>Vector comparison is supported with standard comparison
operators: <code>==, !=, <, <=, >, >=</code>. Comparison operands can be
vector expressions of integer-type or real-type. Comparison between
integer-type vectors and real-type vectors are not supported. The
result of the comparison is a vector of the same width and number of
elements as the comparison operands with a signed integral element
type.
</p>
<p>Vectors are compared element-wise producing 0 when comparison is false
and -1 (constant of the appropriate type where all bits are set)
otherwise. Consider the following example.
</p>
<div class="smallexample">
<pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
v4si a = {1,2,3,4};
v4si b = {3,2,1,4};
v4si c;
c = a > b; /* The result would be {0, 0,-1, 0} */
c = a == b; /* The result would be {0,-1, 0,-1} */
</pre></div>
<p>In C++, the ternary operator <code>?:</code> is available. <code>a?b:c</code>, where
<code>b</code> and <code>c</code> are vectors of the same type and <code>a</code> is an
integer vector with the same number of elements of the same size as <code>b</code>
and <code>c</code>, computes all three arguments and creates a vector
<code>{a[0]?b[0]:c[0], a[1]?b[1]:c[1], …}</code>. Note that unlike in
OpenCL, <code>a</code> is thus interpreted as <code>a != 0</code> and not <code>a < 0</code>.
As in the case of binary operations, this syntax is also accepted when
one of <code>b</code> or <code>c</code> is a scalar that is then transformed into a
vector. If both <code>b</code> and <code>c</code> are scalars and the type of
<code>true?b:c</code> has the same size as the element type of <code>a</code>, then
<code>b</code> and <code>c</code> are converted to a vector type whose elements have
this type and with the same number of elements as <code>a</code>.
</p>
<p>In C++, the logic operators <code>!, &&, ||</code> are available for vectors.
<code>!v</code> is equivalent to <code>v == 0</code>, <code>a && b</code> is equivalent to
<code>a!=0 & b!=0</code> and <code>a || b</code> is equivalent to <code>a!=0 | b!=0</code>.
For mixed operations between a scalar <code>s</code> and a vector <code>v</code>,
<code>s && v</code> is equivalent to <code>s?v!=0:0</code> (the evaluation is
short-circuit) and <code>v && s</code> is equivalent to <code>v!=0 & (s?-1:0)</code>.
</p>
<a name="index-_005f_005fbuiltin_005fshuffle"></a>
<p>Vector shuffling is available using functions
<code>__builtin_shuffle (vec, mask)</code> and
<code>__builtin_shuffle (vec0, vec1, mask)</code>.
Both functions construct a permutation of elements from one or two
vectors and return a vector of the same type as the input vector(s).
The <var>mask</var> is an integral vector with the same width (<var>W</var>)
and element count (<var>N</var>) as the output vector.
</p>
<p>The elements of the input vectors are numbered in memory ordering of
<var>vec0</var> beginning at 0 and <var>vec1</var> beginning at <var>N</var>. The
elements of <var>mask</var> are considered modulo <var>N</var> in the single-operand
case and modulo <em>2*<var>N</var></em> in the two-operand case.
</p>
<p>Consider the following example,
</p>
<div class="smallexample">
<pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
v4si a = {1,2,3,4};
v4si b = {5,6,7,8};
v4si mask1 = {0,1,1,3};
v4si mask2 = {0,4,2,5};
v4si res;
res = __builtin_shuffle (a, mask1); /* res is {1,2,2,4} */
res = __builtin_shuffle (a, b, mask2); /* res is {1,5,3,6} */
</pre></div>
<p>Note that <code>__builtin_shuffle</code> is intentionally semantically
compatible with the OpenCL <code>shuffle</code> and <code>shuffle2</code> functions.
</p>
<p>You can declare variables and use them in function calls and returns, as
well as in assignments and some casts. You can specify a vector type as
a return type for a function. Vector types can also be used as function
arguments. It is possible to cast from one vector type to another,
provided they are of the same size (in fact, you can also cast vectors
to and from other datatypes of the same size).
</p>
<p>You cannot operate between vectors of different lengths or different
signedness without a cast.
</p>
<a name="index-_005f_005fbuiltin_005fshufflevector"></a>
<p>Vector shuffling is available using the
<code>__builtin_shufflevector (vec1, vec2, index...)</code>
function. <var>vec1</var> and <var>vec2</var> must be expressions with
vector type with a compatible element type. The result of
<code>__builtin_shufflevector</code> is a vector with the same element type
as <var>vec1</var> and <var>vec2</var> but that has an element count equal to
the number of indices specified.
</p>
<p>The <var>index</var> arguments are a list of integers that specify the
elements indices of the first two vectors that should be extracted and
returned in a new vector. These element indices are numbered sequentially
starting with the first vector, continuing into the second vector.
An index of -1 can be used to indicate that the corresponding element in
the returned vector is a don’t care and can be freely chosen to optimized
the generated code sequence performing the shuffle operation.
</p>
<p>Consider the following example,
</p><div class="smallexample">
<pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
typedef int v8si __attribute__ ((vector_size (32)));
v8si a = {1,-2,3,-4,5,-6,7,-8};
v4si b = __builtin_shufflevector (a, a, 0, 2, 4, 6); /* b is {1,3,5,7} */
v4si c = {-2,-4,-6,-8};
v8si d = __builtin_shufflevector (c, b, 4, 0, 5, 1, 6, 2, 7, 3); /* d is a */
</pre></div>
<a name="index-_005f_005fbuiltin_005fconvertvector"></a>
<p>Vector conversion is available using the
<code>__builtin_convertvector (vec, vectype)</code>
function. <var>vec</var> must be an expression with integral or floating
vector type and <var>vectype</var> an integral or floating vector type with the
same number of elements. The result has <var>vectype</var> type and value of
a C cast of every element of <var>vec</var> to the element type of <var>vectype</var>.
</p>
<p>Consider the following example,
</p><div class="smallexample">
<pre class="smallexample">typedef int v4si __attribute__ ((vector_size (16)));
typedef float v4sf __attribute__ ((vector_size (16)));
typedef double v4df __attribute__ ((vector_size (32)));
typedef unsigned long long v4di __attribute__ ((vector_size (32)));
v4si a = {1,-2,3,-4};
v4sf b = {1.5f,-2.5f,3.f,7.f};
v4di c = {1ULL,5ULL,0ULL,10ULL};
v4sf d = __builtin_convertvector (a, v4sf); /* d is {1.f,-2.f,3.f,-4.f} */
/* Equivalent of:
v4sf d = { (float)a[0], (float)a[1], (float)a[2], (float)a[3] }; */
v4df e = __builtin_convertvector (a, v4df); /* e is {1.,-2.,3.,-4.} */
v4df f = __builtin_convertvector (b, v4df); /* f is {1.5,-2.5,3.,7.} */
v4si g = __builtin_convertvector (f, v4si); /* g is {1,-2,3,7} */
v4si h = __builtin_convertvector (c, v4si); /* h is {1,5,0,10} */
</pre></div>
<a name="index-vector-types_002c-using-with-x86-intrinsics"></a>
<p>Sometimes it is desirable to write code using a mix of generic vector
operations (for clarity) and machine-specific vector intrinsics (to
access vector instructions that are not exposed via generic built-ins).
On x86, intrinsic functions for integer vectors typically use the same
vector type <code>__m128i</code> irrespective of how they interpret the vector,
making it necessary to cast their arguments and return values from/to
other vector types. In C, you can make use of a <code>union</code> type:
</p><div class="smallexample">
<pre class="smallexample">#include <immintrin.h>
typedef unsigned char u8x16 __attribute__ ((vector_size (16)));
typedef unsigned int u32x4 __attribute__ ((vector_size (16)));
typedef union {
__m128i mm;
u8x16 u8;
u32x4 u32;
} v128;
</pre></div>
<p>for variables that can be used with both built-in operators and x86
intrinsics:
</p>
<div class="smallexample">
<pre class="smallexample">v128 x, y = { 0 };
memcpy (&x, ptr, sizeof x);
y.u8 += 0x80;
x.mm = _mm_adds_epu8 (x.mm, y.mm);
x.u32 &= 0xffffff;
/* Instead of a variable, a compound literal may be used to pass the
return value of an intrinsic call to a function expecting the union: */
v128 foo (v128);
x = foo ((v128) {_mm_adds_epu8 (x.mm, y.mm)});
</pre></div>
<hr>
<div class="header">
<p>
Next: <a href="Offsetof.html#Offsetof" accesskey="n" rel="next">Offsetof</a>, Previous: <a href="Return-Address.html#Return-Address" accesskey="p" rel="previous">Return Address</a>, Up: <a href="C-Extensions.html#C-Extensions" accesskey="u" rel="up">C Extensions</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Indices.html#Indices" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>
|