diff options
author | alk3pInjection <webmaster@raspii.tech> | 2024-02-04 16:16:35 +0800 |
---|---|---|
committer | alk3pInjection <webmaster@raspii.tech> | 2024-02-04 16:16:35 +0800 |
commit | abdaadbcae30fe0c9a66c7516798279fdfd97750 (patch) | |
tree | 00a54a6e25601e43876d03c1a4a12a749d4a914c /share/doc/gcc/Optimize-Options.html |
https://developer.arm.com/downloads/-/arm-gnu-toolchain-downloads
Change-Id: I7303388733328cd98ab9aa3c30236db67f2e9e9c
Diffstat (limited to 'share/doc/gcc/Optimize-Options.html')
-rw-r--r-- | share/doc/gcc/Optimize-Options.html | 5194 |
1 files changed, 5194 insertions, 0 deletions
diff --git a/share/doc/gcc/Optimize-Options.html b/share/doc/gcc/Optimize-Options.html new file mode 100644 index 0000000..5c50c2e --- /dev/null +++ b/share/doc/gcc/Optimize-Options.html @@ -0,0 +1,5194 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> +<html> +<!-- This file documents the use of the GNU compilers. + +Copyright (C) 1988-2023 Free Software Foundation, Inc. + +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation; with the +Invariant Sections being "Funding Free Software", the Front-Cover +Texts being (a) (see below), and with the Back-Cover Texts being (b) +(see below). A copy of the license is included in the section entitled +"GNU Free Documentation License". + +(a) The FSF's Front-Cover Text is: + +A GNU Manual + +(b) The FSF's Back-Cover Text is: + +You have freedom to copy and modify this GNU Manual, like GNU + software. Copies published by the Free Software Foundation raise + funds for GNU development. --> +<!-- Created by GNU Texinfo 5.1, http://www.gnu.org/software/texinfo/ --> +<head> +<title>Using the GNU Compiler Collection (GCC): Optimize Options</title> + +<meta name="description" content="Using the GNU Compiler Collection (GCC): Optimize Options"> +<meta name="keywords" content="Using the GNU Compiler Collection (GCC): Optimize Options"> +<meta name="resource-type" content="document"> +<meta name="distribution" content="global"> +<meta name="Generator" content="makeinfo"> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> +<link href="index.html#Top" rel="start" title="Top"> +<link href="Indices.html#Indices" rel="index" title="Indices"> +<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents"> +<link href="Invoking-GCC.html#Invoking-GCC" rel="up" title="Invoking GCC"> +<link href="Instrumentation-Options.html#Instrumentation-Options" rel="next" title="Instrumentation Options"> +<link href="Debugging-Options.html#Debugging-Options" rel="previous" title="Debugging Options"> +<style type="text/css"> +<!-- +a.summary-letter {text-decoration: none} +blockquote.smallquotation {font-size: smaller} +div.display {margin-left: 3.2em} +div.example {margin-left: 3.2em} +div.indentedblock {margin-left: 3.2em} +div.lisp {margin-left: 3.2em} +div.smalldisplay {margin-left: 3.2em} +div.smallexample {margin-left: 3.2em} +div.smallindentedblock {margin-left: 3.2em; font-size: smaller} +div.smalllisp {margin-left: 3.2em} +kbd {font-style:oblique} +pre.display {font-family: inherit} +pre.format {font-family: inherit} +pre.menu-comment {font-family: serif} +pre.menu-preformatted {font-family: serif} +pre.smalldisplay {font-family: inherit; font-size: smaller} +pre.smallexample {font-size: smaller} +pre.smallformat {font-family: inherit; font-size: smaller} +pre.smalllisp {font-size: smaller} +span.nocodebreak {white-space:nowrap} +span.nolinebreak {white-space:nowrap} +span.roman {font-family:serif; font-weight:normal} +span.sansserif {font-family:sans-serif; font-weight:normal} +ul.no-bullet {list-style: none} +--> +</style> + + +</head> + +<body lang="en_US" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000"> +<a name="Optimize-Options"></a> +<div class="header"> +<p> +Next: <a href="Instrumentation-Options.html#Instrumentation-Options" accesskey="n" rel="next">Instrumentation Options</a>, Previous: <a href="Debugging-Options.html#Debugging-Options" accesskey="p" rel="previous">Debugging Options</a>, Up: <a href="Invoking-GCC.html#Invoking-GCC" accesskey="u" rel="up">Invoking GCC</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Indices.html#Indices" title="Index" rel="index">Index</a>]</p> +</div> +<hr> +<a name="Options-That-Control-Optimization"></a> +<h3 class="section">3.11 Options That Control Optimization</h3> +<a name="index-optimize-options"></a> +<a name="index-options_002c-optimization"></a> + +<p>These options control various sorts of optimizations. +</p> +<p>Without any optimization option, the compiler’s goal is to reduce the +cost of compilation and to make debugging produce the expected +results. Statements are independent: if you stop the program with a +breakpoint between statements, you can then assign a new value to any +variable or change the program counter to any other statement in the +function and get exactly the results you expect from the source +code. +</p> +<p>Turning on optimization flags makes the compiler attempt to improve +the performance and/or code size at the expense of compilation time +and possibly the ability to debug the program. +</p> +<p>The compiler performs optimization based on the knowledge it has of the +program. Compiling multiple files at once to a single output file mode allows +the compiler to use information gained from all of the files when compiling +each of them. +</p> +<p>Not all optimizations are controlled directly by a flag. Only +optimizations that have a flag are listed in this section. +</p> +<p>Most optimizations are completely disabled at <samp>-O0</samp> or if an +<samp>-O</samp> level is not set on the command line, even if individual +optimization flags are specified. Similarly, <samp>-Og</samp> suppresses +many optimization passes. +</p> +<p>Depending on the target and how GCC was configured, a slightly different +set of optimizations may be enabled at each <samp>-O</samp> level than +those listed here. You can invoke GCC with <samp>-Q --help=optimizers</samp> +to find out the exact set of optimizations that are enabled at each level. +See <a href="Overall-Options.html#Overall-Options">Overall Options</a>, for examples. +</p> +<dl compact="compact"> +<dd><a name="index-O"></a> +<a name="index-O1"></a> +</dd> +<dt><code>-O</code></dt> +<dt><code>-O1</code></dt> +<dd><p>Optimize. Optimizing compilation takes somewhat more time, and a lot +more memory for a large function. +</p> +<p>With <samp>-O</samp>, the compiler tries to reduce code size and execution +time, without performing any optimizations that take a great deal of +compilation time. +</p> + +<p><samp>-O</samp> turns on the following optimization flags: +</p> +<div class="smallexample"> +<pre class="smallexample">-fauto-inc-dec +-fbranch-count-reg +-fcombine-stack-adjustments +-fcompare-elim +-fcprop-registers +-fdce +-fdefer-pop +-fdelayed-branch +-fdse +-fforward-propagate +-fguess-branch-probability +-fif-conversion +-fif-conversion2 +-finline-functions-called-once +-fipa-modref +-fipa-profile +-fipa-pure-const +-fipa-reference +-fipa-reference-addressable +-fmerge-constants +-fmove-loop-invariants +-fmove-loop-stores +-fomit-frame-pointer +-freorder-blocks +-fshrink-wrap +-fshrink-wrap-separate +-fsplit-wide-types +-fssa-backprop +-fssa-phiopt +-ftree-bit-ccp +-ftree-ccp +-ftree-ch +-ftree-coalesce-vars +-ftree-copy-prop +-ftree-dce +-ftree-dominator-opts +-ftree-dse +-ftree-forwprop +-ftree-fre +-ftree-phiprop +-ftree-pta +-ftree-scev-cprop +-ftree-sink +-ftree-slsr +-ftree-sra +-ftree-ter +-funit-at-a-time +</pre></div> + +<a name="index-O2"></a> +</dd> +<dt><code>-O2</code></dt> +<dd><p>Optimize even more. GCC performs nearly all supported optimizations +that do not involve a space-speed tradeoff. +As compared to <samp>-O</samp>, this option increases both compilation time +and the performance of the generated code. +</p> +<p><samp>-O2</samp> turns on all optimization flags specified by <samp>-O1</samp>. It +also turns on the following optimization flags: +</p> +<div class="smallexample"> +<pre class="smallexample">-falign-functions -falign-jumps +-falign-labels -falign-loops +-fcaller-saves +-fcode-hoisting +-fcrossjumping +-fcse-follow-jumps -fcse-skip-blocks +-fdelete-null-pointer-checks +-fdevirtualize -fdevirtualize-speculatively +-fexpensive-optimizations +-ffinite-loops +-fgcse -fgcse-lm +-fhoist-adjacent-loads +-finline-functions +-finline-small-functions +-findirect-inlining +-fipa-bit-cp -fipa-cp -fipa-icf +-fipa-ra -fipa-sra -fipa-vrp +-fisolate-erroneous-paths-dereference +-flra-remat +-foptimize-sibling-calls +-foptimize-strlen +-fpartial-inlining +-fpeephole2 +-freorder-blocks-algorithm=stc +-freorder-blocks-and-partition -freorder-functions +-frerun-cse-after-loop +-fschedule-insns -fschedule-insns2 +-fsched-interblock -fsched-spec +-fstore-merging +-fstrict-aliasing +-fthread-jumps +-ftree-builtin-call-dce +-ftree-loop-vectorize +-ftree-pre +-ftree-slp-vectorize +-ftree-switch-conversion -ftree-tail-merge +-ftree-vrp +-fvect-cost-model=very-cheap +</pre></div> + +<p>Please note the warning under <samp>-fgcse</samp> about +invoking <samp>-O2</samp> on programs that use computed gotos. +</p> +<a name="index-O3"></a> +</dd> +<dt><code>-O3</code></dt> +<dd><p>Optimize yet more. <samp>-O3</samp> turns on all optimizations specified +by <samp>-O2</samp> and also turns on the following optimization flags: +</p> +<div class="smallexample"> +<pre class="smallexample">-fgcse-after-reload +-fipa-cp-clone +-floop-interchange +-floop-unroll-and-jam +-fpeel-loops +-fpredictive-commoning +-fsplit-loops +-fsplit-paths +-ftree-loop-distribution +-ftree-partial-pre +-funswitch-loops +-fvect-cost-model=dynamic +-fversion-loops-for-strides +</pre></div> + +<a name="index-O0"></a> +</dd> +<dt><code>-O0</code></dt> +<dd><p>Reduce compilation time and make debugging produce the expected +results. This is the default. +</p> +<a name="index-Os"></a> +</dd> +<dt><code>-Os</code></dt> +<dd><p>Optimize for size. <samp>-Os</samp> enables all <samp>-O2</samp> optimizations +except those that often increase code size: +</p> +<div class="smallexample"> +<pre class="smallexample">-falign-functions -falign-jumps +-falign-labels -falign-loops +-fprefetch-loop-arrays -freorder-blocks-algorithm=stc +</pre></div> + +<p>It also enables <samp>-finline-functions</samp>, causes the compiler to tune for +code size rather than execution speed, and performs further optimizations +designed to reduce code size. +</p> +<a name="index-Ofast"></a> +</dd> +<dt><code>-Ofast</code></dt> +<dd><p>Disregard strict standards compliance. <samp>-Ofast</samp> enables all +<samp>-O3</samp> optimizations. It also enables optimizations that are not +valid for all standard-compliant programs. +It turns on <samp>-ffast-math</samp>, <samp>-fallow-store-data-races</samp> +and the Fortran-specific <samp>-fstack-arrays</samp>, unless +<samp>-fmax-stack-var-size</samp> is specified, and <samp>-fno-protect-parens</samp>. +It turns off <samp>-fsemantic-interposition</samp>. +</p> +<a name="index-Og"></a> +</dd> +<dt><code>-Og</code></dt> +<dd><p>Optimize debugging experience. <samp>-Og</samp> should be the optimization +level of choice for the standard edit-compile-debug cycle, offering +a reasonable level of optimization while maintaining fast compilation +and a good debugging experience. It is a better choice than <samp>-O0</samp> +for producing debuggable code because some compiler passes +that collect debug information are disabled at <samp>-O0</samp>. +</p> +<p>Like <samp>-O0</samp>, <samp>-Og</samp> completely disables a number of +optimization passes so that individual options controlling them have +no effect. Otherwise <samp>-Og</samp> enables all <samp>-O1</samp> +optimization flags except for those that may interfere with debugging: +</p> +<div class="smallexample"> +<pre class="smallexample">-fbranch-count-reg -fdelayed-branch +-fdse -fif-conversion -fif-conversion2 +-finline-functions-called-once +-fmove-loop-invariants -fmove-loop-stores -fssa-phiopt +-ftree-bit-ccp -ftree-dse -ftree-pta -ftree-sra +</pre></div> + +<a name="index-Oz"></a> +</dd> +<dt><code>-Oz</code></dt> +<dd><p>Optimize aggressively for size rather than speed. This may increase +the number of instructions executed if those instructions require +fewer bytes to encode. <samp>-Oz</samp> behaves similarly to <samp>-Os</samp> +including enabling most <samp>-O2</samp> optimizations. +</p> +</dd> +</dl> + +<p>If you use multiple <samp>-O</samp> options, with or without level numbers, +the last such option is the one that is effective. +</p> +<p>Options of the form <samp>-f<var>flag</var></samp> specify machine-independent +flags. Most flags have both positive and negative forms; the negative +form of <samp>-ffoo</samp> is <samp>-fno-foo</samp>. In the table +below, only one of the forms is listed—the one you typically +use. You can figure out the other form by either removing ‘<samp>no-</samp>’ +or adding it. +</p> +<p>The following options control specific optimizations. They are either +activated by <samp>-O</samp> options or are related to ones that are. You +can use the following flags in the rare cases when “fine-tuning” of +optimizations to be performed is desired. +</p> +<dl compact="compact"> +<dd><a name="index-fno_002ddefer_002dpop"></a> +<a name="index-fdefer_002dpop"></a> +</dd> +<dt><code>-fno-defer-pop</code></dt> +<dd><p>For machines that must pop arguments after a function call, always pop +the arguments as soon as each function returns. +At levels <samp>-O1</samp> and higher, <samp>-fdefer-pop</samp> is the default; +this allows the compiler to let arguments accumulate on the stack for several +function calls and pop them all at once. +</p> +<a name="index-fforward_002dpropagate"></a> +</dd> +<dt><code>-fforward-propagate</code></dt> +<dd><p>Perform a forward propagation pass on RTL. The pass tries to combine two +instructions and checks if the result can be simplified. If loop unrolling +is active, two passes are performed and the second is scheduled after +loop unrolling. +</p> +<p>This option is enabled by default at optimization levels <samp>-O1</samp>, +<samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-ffp_002dcontract"></a> +</dd> +<dt><code>-ffp-contract=<var>style</var></code></dt> +<dd><p><samp>-ffp-contract=off</samp> disables floating-point expression contraction. +<samp>-ffp-contract=fast</samp> enables floating-point expression contraction +such as forming of fused multiply-add operations if the target has +native support for them. +<samp>-ffp-contract=on</samp> enables floating-point expression contraction +if allowed by the language standard. This is currently not implemented +and treated equal to <samp>-ffp-contract=off</samp>. +</p> +<p>The default is <samp>-ffp-contract=fast</samp>. +</p> +<a name="index-fomit_002dframe_002dpointer"></a> +</dd> +<dt><code>-fomit-frame-pointer</code></dt> +<dd><p>Omit the frame pointer in functions that don’t need one. This avoids the +instructions to save, set up and restore the frame pointer; on many targets +it also makes an extra register available. +</p> +<p>On some targets this flag has no effect because the standard calling sequence +always uses a frame pointer, so it cannot be omitted. +</p> +<p>Note that <samp>-fno-omit-frame-pointer</samp> doesn’t guarantee the frame pointer +is used in all functions. Several targets always omit the frame pointer in +leaf functions. +</p> +<p>Enabled by default at <samp>-O1</samp> and higher. +</p> +<a name="index-foptimize_002dsibling_002dcalls"></a> +</dd> +<dt><code>-foptimize-sibling-calls</code></dt> +<dd><p>Optimize sibling and tail recursive calls. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-foptimize_002dstrlen"></a> +</dd> +<dt><code>-foptimize-strlen</code></dt> +<dd><p>Optimize various standard C string functions (e.g. <code>strlen</code>, +<code>strchr</code> or <code>strcpy</code>) and +their <code>_FORTIFY_SOURCE</code> counterparts into faster alternatives. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>. +</p> +<a name="index-fno_002dinline"></a> +<a name="index-finline"></a> +</dd> +<dt><code>-fno-inline</code></dt> +<dd><p>Do not expand any functions inline apart from those marked with +the <code>always_inline</code> attribute. This is the default when not +optimizing. +</p> +<p>Single functions can be exempted from inlining by marking them +with the <code>noinline</code> attribute. +</p> +<a name="index-finline_002dsmall_002dfunctions"></a> +</dd> +<dt><code>-finline-small-functions</code></dt> +<dd><p>Integrate functions into their callers when their body is smaller than expected +function call code (so overall size of program gets smaller). The compiler +heuristically decides which functions are simple enough to be worth integrating +in this way. This inlining applies to all functions, even those not declared +inline. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-findirect_002dinlining"></a> +</dd> +<dt><code>-findirect-inlining</code></dt> +<dd><p>Inline also indirect calls that are discovered to be known at compile +time thanks to previous inlining. This option has any effect only +when inlining itself is turned on by the <samp>-finline-functions</samp> +or <samp>-finline-small-functions</samp> options. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-finline_002dfunctions"></a> +</dd> +<dt><code>-finline-functions</code></dt> +<dd><p>Consider all functions for inlining, even if they are not declared inline. +The compiler heuristically decides which functions are worth integrating +in this way. +</p> +<p>If all calls to a given function are integrated, and the function is +declared <code>static</code>, then the function is normally not output as +assembler code in its own right. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. Also enabled +by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>. +</p> +<a name="index-finline_002dfunctions_002dcalled_002donce"></a> +</dd> +<dt><code>-finline-functions-called-once</code></dt> +<dd><p>Consider all <code>static</code> functions called once for inlining into their +caller even if they are not marked <code>inline</code>. If a call to a given +function is integrated, then the function is not output as assembler code +in its own right. +</p> +<p>Enabled at levels <samp>-O1</samp>, <samp>-O2</samp>, <samp>-O3</samp> and <samp>-Os</samp>, +but not <samp>-Og</samp>. +</p> +<a name="index-fearly_002dinlining"></a> +</dd> +<dt><code>-fearly-inlining</code></dt> +<dd><p>Inline functions marked by <code>always_inline</code> and functions whose body seems +smaller than the function call overhead early before doing +<samp>-fprofile-generate</samp> instrumentation and real inlining pass. Doing so +makes profiling significantly cheaper and usually inlining faster on programs +having large chains of nested wrapper functions. +</p> +<p>Enabled by default. +</p> +<a name="index-fipa_002dsra"></a> +</dd> +<dt><code>-fipa-sra</code></dt> +<dd><p>Perform interprocedural scalar replacement of aggregates, removal of +unused parameters and replacement of parameters passed by reference +by parameters passed by value. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp> and <samp>-Os</samp>. +</p> +<a name="index-finline_002dlimit"></a> +</dd> +<dt><code>-finline-limit=<var>n</var></code></dt> +<dd><p>By default, GCC limits the size of functions that can be inlined. This flag +allows coarse control of this limit. <var>n</var> is the size of functions that +can be inlined in number of pseudo instructions. +</p> +<p>Inlining is actually controlled by a number of parameters, which may be +specified individually by using <samp>--param <var>name</var>=<var>value</var></samp>. +The <samp>-finline-limit=<var>n</var></samp> option sets some of these parameters +as follows: +</p> +<dl compact="compact"> +<dt><code>max-inline-insns-single</code></dt> +<dd><p>is set to <var>n</var>/2. +</p></dd> +<dt><code>max-inline-insns-auto</code></dt> +<dd><p>is set to <var>n</var>/2. +</p></dd> +</dl> + +<p>See below for a documentation of the individual +parameters controlling inlining and for the defaults of these parameters. +</p> +<p><em>Note:</em> there may be no value to <samp>-finline-limit</samp> that results +in default behavior. +</p> +<p><em>Note:</em> pseudo instruction represents, in this particular context, an +abstract measurement of function’s size. In no way does it represent a count +of assembly instructions and as such its exact meaning might change from one +release to an another. +</p> +<a name="index-fno_002dkeep_002dinline_002ddllexport"></a> +<a name="index-fkeep_002dinline_002ddllexport"></a> +</dd> +<dt><code>-fno-keep-inline-dllexport</code></dt> +<dd><p>This is a more fine-grained version of <samp>-fkeep-inline-functions</samp>, +which applies only to functions that are declared using the <code>dllexport</code> +attribute or declspec. See <a href="Function-Attributes.html#Function-Attributes">Declaring Attributes of +Functions</a>. +</p> +<a name="index-fkeep_002dinline_002dfunctions"></a> +</dd> +<dt><code>-fkeep-inline-functions</code></dt> +<dd><p>In C, emit <code>static</code> functions that are declared <code>inline</code> +into the object file, even if the function has been inlined into all +of its callers. This switch does not affect functions using the +<code>extern inline</code> extension in GNU C90. In C++, emit any and all +inline functions into the object file. +</p> +<a name="index-fkeep_002dstatic_002dfunctions"></a> +</dd> +<dt><code>-fkeep-static-functions</code></dt> +<dd><p>Emit <code>static</code> functions into the object file, even if the function +is never used. +</p> +<a name="index-fkeep_002dstatic_002dconsts"></a> +</dd> +<dt><code>-fkeep-static-consts</code></dt> +<dd><p>Emit variables declared <code>static const</code> when optimization isn’t turned +on, even if the variables aren’t referenced. +</p> +<p>GCC enables this option by default. If you want to force the compiler to +check if a variable is referenced, regardless of whether or not +optimization is turned on, use the <samp>-fno-keep-static-consts</samp> option. +</p> +<a name="index-fmerge_002dconstants"></a> +</dd> +<dt><code>-fmerge-constants</code></dt> +<dd><p>Attempt to merge identical constants (string constants and floating-point +constants) across compilation units. +</p> +<p>This option is the default for optimized compilation if the assembler and +linker support it. Use <samp>-fno-merge-constants</samp> to inhibit this +behavior. +</p> +<p>Enabled at levels <samp>-O1</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-fmerge_002dall_002dconstants"></a> +</dd> +<dt><code>-fmerge-all-constants</code></dt> +<dd><p>Attempt to merge identical constants and identical variables. +</p> +<p>This option implies <samp>-fmerge-constants</samp>. In addition to +<samp>-fmerge-constants</samp> this considers e.g. even constant initialized +arrays or initialized constant variables with integral or floating-point +types. Languages like C or C++ require each variable, including multiple +instances of the same variable in recursive calls, to have distinct locations, +so using this option results in non-conforming +behavior. +</p> +<a name="index-fmodulo_002dsched"></a> +</dd> +<dt><code>-fmodulo-sched</code></dt> +<dd><p>Perform swing modulo scheduling immediately before the first scheduling +pass. This pass looks at innermost loops and reorders their +instructions by overlapping different iterations. +</p> +<a name="index-fmodulo_002dsched_002dallow_002dregmoves"></a> +</dd> +<dt><code>-fmodulo-sched-allow-regmoves</code></dt> +<dd><p>Perform more aggressive SMS-based modulo scheduling with register moves +allowed. By setting this flag certain anti-dependences edges are +deleted, which triggers the generation of reg-moves based on the +life-range analysis. This option is effective only with +<samp>-fmodulo-sched</samp> enabled. +</p> +<a name="index-fno_002dbranch_002dcount_002dreg"></a> +<a name="index-fbranch_002dcount_002dreg"></a> +</dd> +<dt><code>-fno-branch-count-reg</code></dt> +<dd><p>Disable the optimization pass that scans for opportunities to use +“decrement and branch” instructions on a count register instead of +instruction sequences that decrement a register, compare it against zero, and +then branch based upon the result. This option is only meaningful on +architectures that support such instructions, which include x86, PowerPC, +IA-64 and S/390. Note that the <samp>-fno-branch-count-reg</samp> option +doesn’t remove the decrement and branch instructions from the generated +instruction stream introduced by other optimization passes. +</p> +<p>The default is <samp>-fbranch-count-reg</samp> at <samp>-O1</samp> and higher, +except for <samp>-Og</samp>. +</p> +<a name="index-fno_002dfunction_002dcse"></a> +<a name="index-ffunction_002dcse"></a> +</dd> +<dt><code>-fno-function-cse</code></dt> +<dd><p>Do not put function addresses in registers; make each instruction that +calls a constant function contain the function’s address explicitly. +</p> +<p>This option results in less efficient code, but some strange hacks +that alter the assembler output may be confused by the optimizations +performed when this option is not used. +</p> +<p>The default is <samp>-ffunction-cse</samp> +</p> +<a name="index-fno_002dzero_002dinitialized_002din_002dbss"></a> +<a name="index-fzero_002dinitialized_002din_002dbss"></a> +</dd> +<dt><code>-fno-zero-initialized-in-bss</code></dt> +<dd><p>If the target supports a BSS section, GCC by default puts variables that +are initialized to zero into BSS. This can save space in the resulting +code. +</p> +<p>This option turns off this behavior because some programs explicitly +rely on variables going to the data section—e.g., so that the +resulting executable can find the beginning of that section and/or make +assumptions based on that. +</p> +<p>The default is <samp>-fzero-initialized-in-bss</samp>. +</p> +<a name="index-fthread_002djumps"></a> +</dd> +<dt><code>-fthread-jumps</code></dt> +<dd><p>Perform optimizations that check to see if a jump branches to a +location where another comparison subsumed by the first is found. If +so, the first branch is redirected to either the destination of the +second branch or a point immediately following it, depending on whether +the condition is known to be true or false. +</p> +<p>Enabled at levels <samp>-O1</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-fsplit_002dwide_002dtypes"></a> +</dd> +<dt><code>-fsplit-wide-types</code></dt> +<dd><p>When using a type that occupies multiple registers, such as <code>long +long</code> on a 32-bit system, split the registers apart and allocate them +independently. This normally generates better code for those types, +but may make debugging more difficult. +</p> +<p>Enabled at levels <samp>-O1</samp>, <samp>-O2</samp>, <samp>-O3</samp>, +<samp>-Os</samp>. +</p> +<a name="index-fsplit_002dwide_002dtypes_002dearly"></a> +</dd> +<dt><code>-fsplit-wide-types-early</code></dt> +<dd><p>Fully split wide types early, instead of very late. +This option has no effect unless <samp>-fsplit-wide-types</samp> is turned on. +</p> +<p>This is the default on some targets. +</p> +<a name="index-fcse_002dfollow_002djumps"></a> +</dd> +<dt><code>-fcse-follow-jumps</code></dt> +<dd><p>In common subexpression elimination (CSE), scan through jump instructions +when the target of the jump is not reached by any other path. For +example, when CSE encounters an <code>if</code> statement with an +<code>else</code> clause, CSE follows the jump when the condition +tested is false. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-fcse_002dskip_002dblocks"></a> +</dd> +<dt><code>-fcse-skip-blocks</code></dt> +<dd><p>This is similar to <samp>-fcse-follow-jumps</samp>, but causes CSE to +follow jumps that conditionally skip over blocks. When CSE +encounters a simple <code>if</code> statement with no else clause, +<samp>-fcse-skip-blocks</samp> causes CSE to follow the jump around the +body of the <code>if</code>. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-frerun_002dcse_002dafter_002dloop"></a> +</dd> +<dt><code>-frerun-cse-after-loop</code></dt> +<dd><p>Re-run common subexpression elimination after loop optimizations are +performed. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-fgcse"></a> +</dd> +<dt><code>-fgcse</code></dt> +<dd><p>Perform a global common subexpression elimination pass. +This pass also performs global constant and copy propagation. +</p> +<p><em>Note:</em> When compiling a program using computed gotos, a GCC +extension, you may get better run-time performance if you disable +the global common subexpression elimination pass by adding +<samp>-fno-gcse</samp> to the command line. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-fgcse_002dlm"></a> +</dd> +<dt><code>-fgcse-lm</code></dt> +<dd><p>When <samp>-fgcse-lm</samp> is enabled, global common subexpression elimination +attempts to move loads that are only killed by stores into themselves. This +allows a loop containing a load/store sequence to be changed to a load outside +the loop, and a copy/store within the loop. +</p> +<p>Enabled by default when <samp>-fgcse</samp> is enabled. +</p> +<a name="index-fgcse_002dsm"></a> +</dd> +<dt><code>-fgcse-sm</code></dt> +<dd><p>When <samp>-fgcse-sm</samp> is enabled, a store motion pass is run after +global common subexpression elimination. This pass attempts to move +stores out of loops. When used in conjunction with <samp>-fgcse-lm</samp>, +loops containing a load/store sequence can be changed to a load before +the loop and a store after the loop. +</p> +<p>Not enabled at any optimization level. +</p> +<a name="index-fgcse_002dlas"></a> +</dd> +<dt><code>-fgcse-las</code></dt> +<dd><p>When <samp>-fgcse-las</samp> is enabled, the global common subexpression +elimination pass eliminates redundant loads that come after stores to the +same memory location (both partial and full redundancies). +</p> +<p>Not enabled at any optimization level. +</p> +<a name="index-fgcse_002dafter_002dreload"></a> +</dd> +<dt><code>-fgcse-after-reload</code></dt> +<dd><p>When <samp>-fgcse-after-reload</samp> is enabled, a redundant load elimination +pass is performed after reload. The purpose of this pass is to clean up +redundant spilling. +</p> +<p>Enabled by <samp>-O3</samp>, <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>. +</p> +<a name="index-faggressive_002dloop_002doptimizations"></a> +</dd> +<dt><code>-faggressive-loop-optimizations</code></dt> +<dd><p>This option tells the loop optimizer to use language constraints to +derive bounds for the number of iterations of a loop. This assumes that +loop code does not invoke undefined behavior by for example causing signed +integer overflows or out-of-bound array accesses. The bounds for the +number of iterations of a loop are used to guide loop unrolling and peeling +and loop exit test optimizations. +This option is enabled by default. +</p> +<a name="index-funconstrained_002dcommons"></a> +</dd> +<dt><code>-funconstrained-commons</code></dt> +<dd><p>This option tells the compiler that variables declared in common blocks +(e.g. Fortran) may later be overridden with longer trailing arrays. This +prevents certain optimizations that depend on knowing the array bounds. +</p> +<a name="index-fcrossjumping"></a> +</dd> +<dt><code>-fcrossjumping</code></dt> +<dd><p>Perform cross-jumping transformation. +This transformation unifies equivalent code and saves code size. The +resulting code may or may not perform better than without cross-jumping. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-fauto_002dinc_002ddec"></a> +</dd> +<dt><code>-fauto-inc-dec</code></dt> +<dd><p>Combine increments or decrements of addresses with memory accesses. +This pass is always skipped on architectures that do not have +instructions to support this. Enabled by default at <samp>-O1</samp> and +higher on architectures that support this. +</p> +<a name="index-fdce"></a> +</dd> +<dt><code>-fdce</code></dt> +<dd><p>Perform dead code elimination (DCE) on RTL. +Enabled by default at <samp>-O1</samp> and higher. +</p> +<a name="index-fdse"></a> +</dd> +<dt><code>-fdse</code></dt> +<dd><p>Perform dead store elimination (DSE) on RTL. +Enabled by default at <samp>-O1</samp> and higher. +</p> +<a name="index-fif_002dconversion"></a> +</dd> +<dt><code>-fif-conversion</code></dt> +<dd><p>Attempt to transform conditional jumps into branch-less equivalents. This +includes use of conditional moves, min, max, set flags and abs instructions, and +some tricks doable by standard arithmetics. The use of conditional execution +on chips where it is available is controlled by <samp>-fif-conversion2</samp>. +</p> +<p>Enabled at levels <samp>-O1</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>, but +not with <samp>-Og</samp>. +</p> +<a name="index-fif_002dconversion2"></a> +</dd> +<dt><code>-fif-conversion2</code></dt> +<dd><p>Use conditional execution (where available) to transform conditional jumps into +branch-less equivalents. +</p> +<p>Enabled at levels <samp>-O1</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>, but +not with <samp>-Og</samp>. +</p> +<a name="index-fdeclone_002dctor_002ddtor"></a> +</dd> +<dt><code>-fdeclone-ctor-dtor</code></dt> +<dd><p>The C++ ABI requires multiple entry points for constructors and +destructors: one for a base subobject, one for a complete object, and +one for a virtual destructor that calls operator delete afterwards. +For a hierarchy with virtual bases, the base and complete variants are +clones, which means two copies of the function. With this option, the +base and complete variants are changed to be thunks that call a common +implementation. +</p> +<p>Enabled by <samp>-Os</samp>. +</p> +<a name="index-fdelete_002dnull_002dpointer_002dchecks"></a> +</dd> +<dt><code>-fdelete-null-pointer-checks</code></dt> +<dd><p>Assume that programs cannot safely dereference null pointers, and that +no code or data element resides at address zero. +This option enables simple constant +folding optimizations at all optimization levels. In addition, other +optimization passes in GCC use this flag to control global dataflow +analyses that eliminate useless checks for null pointers; these assume +that a memory access to address zero always results in a trap, so +that if a pointer is checked after it has already been dereferenced, +it cannot be null. +</p> +<p>Note however that in some environments this assumption is not true. +Use <samp>-fno-delete-null-pointer-checks</samp> to disable this optimization +for programs that depend on that behavior. +</p> +<p>This option is enabled by default on most targets. On Nios II ELF, it +defaults to off. On AVR and MSP430, this option is completely disabled. +</p> +<p>Passes that use the dataflow information +are enabled independently at different optimization levels. +</p> +<a name="index-fdevirtualize"></a> +</dd> +<dt><code>-fdevirtualize</code></dt> +<dd><p>Attempt to convert calls to virtual functions to direct calls. This +is done both within a procedure and interprocedurally as part of +indirect inlining (<samp>-findirect-inlining</samp>) and interprocedural constant +propagation (<samp>-fipa-cp</samp>). +Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-fdevirtualize_002dspeculatively"></a> +</dd> +<dt><code>-fdevirtualize-speculatively</code></dt> +<dd><p>Attempt to convert calls to virtual functions to speculative direct calls. +Based on the analysis of the type inheritance graph, determine for a given call +the set of likely targets. If the set is small, preferably of size 1, change +the call into a conditional deciding between direct and indirect calls. The +speculative calls enable more optimizations, such as inlining. When they seem +useless after further optimization, they are converted back into original form. +</p> +<a name="index-fdevirtualize_002dat_002dltrans"></a> +</dd> +<dt><code>-fdevirtualize-at-ltrans</code></dt> +<dd><p>Stream extra information needed for aggressive devirtualization when running +the link-time optimizer in local transformation mode. +This option enables more devirtualization but +significantly increases the size of streamed data. For this reason it is +disabled by default. +</p> +<a name="index-fexpensive_002doptimizations"></a> +</dd> +<dt><code>-fexpensive-optimizations</code></dt> +<dd><p>Perform a number of minor optimizations that are relatively expensive. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-free-1"></a> +</dd> +<dt><code>-free</code></dt> +<dd><p>Attempt to remove redundant extension instructions. This is especially +helpful for the x86-64 architecture, which implicitly zero-extends in 64-bit +registers after writing to their lower 32-bit half. +</p> +<p>Enabled for Alpha, AArch64 and x86 at levels <samp>-O2</samp>, +<samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-fno_002dlifetime_002ddse"></a> +<a name="index-flifetime_002ddse"></a> +</dd> +<dt><code>-fno-lifetime-dse</code></dt> +<dd><p>In C++ the value of an object is only affected by changes within its +lifetime: when the constructor begins, the object has an indeterminate +value, and any changes during the lifetime of the object are dead when +the object is destroyed. Normally dead store elimination will take +advantage of this; if your code relies on the value of the object +storage persisting beyond the lifetime of the object, you can use this +flag to disable this optimization. To preserve stores before the +constructor starts (e.g. because your operator new clears the object +storage) but still treat the object as dead after the destructor, you +can use <samp>-flifetime-dse=1</samp>. The default behavior can be +explicitly selected with <samp>-flifetime-dse=2</samp>. +<samp>-flifetime-dse=0</samp> is equivalent to <samp>-fno-lifetime-dse</samp>. +</p> +<a name="index-flive_002drange_002dshrinkage"></a> +</dd> +<dt><code>-flive-range-shrinkage</code></dt> +<dd><p>Attempt to decrease register pressure through register live range +shrinkage. This is helpful for fast processors with small or moderate +size register sets. +</p> +<a name="index-fira_002dalgorithm"></a> +</dd> +<dt><code>-fira-algorithm=<var>algorithm</var></code></dt> +<dd><p>Use the specified coloring algorithm for the integrated register +allocator. The <var>algorithm</var> argument can be ‘<samp>priority</samp>’, which +specifies Chow’s priority coloring, or ‘<samp>CB</samp>’, which specifies +Chaitin-Briggs coloring. Chaitin-Briggs coloring is not implemented +for all architectures, but for those targets that do support it, it is +the default because it generates better code. +</p> +<a name="index-fira_002dregion"></a> +</dd> +<dt><code>-fira-region=<var>region</var></code></dt> +<dd><p>Use specified regions for the integrated register allocator. The +<var>region</var> argument should be one of the following: +</p> +<dl compact="compact"> +<dt>‘<samp>all</samp>’</dt> +<dd><p>Use all loops as register allocation regions. +This can give the best results for machines with a small and/or +irregular register set. +</p> +</dd> +<dt>‘<samp>mixed</samp>’</dt> +<dd><p>Use all loops except for loops with small register pressure +as the regions. This value usually gives +the best results in most cases and for most architectures, +and is enabled by default when compiling with optimization for speed +(<samp>-O</samp>, <samp>-O2</samp>, …). +</p> +</dd> +<dt>‘<samp>one</samp>’</dt> +<dd><p>Use all functions as a single region. +This typically results in the smallest code size, and is enabled by default for +<samp>-Os</samp> or <samp>-O0</samp>. +</p> +</dd> +</dl> + +<a name="index-fira_002dhoist_002dpressure"></a> +</dd> +<dt><code>-fira-hoist-pressure</code></dt> +<dd><p>Use IRA to evaluate register pressure in the code hoisting pass for +decisions to hoist expressions. This option usually results in smaller +code, but it can slow the compiler down. +</p> +<p>This option is enabled at level <samp>-Os</samp> for all targets. +</p> +<a name="index-fira_002dloop_002dpressure"></a> +</dd> +<dt><code>-fira-loop-pressure</code></dt> +<dd><p>Use IRA to evaluate register pressure in loops for decisions to move +loop invariants. This option usually results in generation +of faster and smaller code on machines with large register files (>= 32 +registers), but it can slow the compiler down. +</p> +<p>This option is enabled at level <samp>-O3</samp> for some targets. +</p> +<a name="index-fno_002dira_002dshare_002dsave_002dslots"></a> +<a name="index-fira_002dshare_002dsave_002dslots"></a> +</dd> +<dt><code>-fno-ira-share-save-slots</code></dt> +<dd><p>Disable sharing of stack slots used for saving call-used hard +registers living through a call. Each hard register gets a +separate stack slot, and as a result function stack frames are +larger. +</p> +<a name="index-fno_002dira_002dshare_002dspill_002dslots"></a> +<a name="index-fira_002dshare_002dspill_002dslots"></a> +</dd> +<dt><code>-fno-ira-share-spill-slots</code></dt> +<dd><p>Disable sharing of stack slots allocated for pseudo-registers. Each +pseudo-register that does not get a hard register gets a separate +stack slot, and as a result function stack frames are larger. +</p> +<a name="index-flra_002dremat"></a> +</dd> +<dt><code>-flra-remat</code></dt> +<dd><p>Enable CFG-sensitive rematerialization in LRA. Instead of loading +values of spilled pseudos, LRA tries to rematerialize (recalculate) +values if it is profitable. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-fdelayed_002dbranch"></a> +</dd> +<dt><code>-fdelayed-branch</code></dt> +<dd><p>If supported for the target machine, attempt to reorder instructions +to exploit instruction slots available after delayed branch +instructions. +</p> +<p>Enabled at levels <samp>-O1</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>, +but not at <samp>-Og</samp>. +</p> +<a name="index-fschedule_002dinsns"></a> +</dd> +<dt><code>-fschedule-insns</code></dt> +<dd><p>If supported for the target machine, attempt to reorder instructions to +eliminate execution stalls due to required data being unavailable. This +helps machines that have slow floating point or memory load instructions +by allowing other instructions to be issued until the result of the load +or floating-point instruction is required. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>. +</p> +<a name="index-fschedule_002dinsns2"></a> +</dd> +<dt><code>-fschedule-insns2</code></dt> +<dd><p>Similar to <samp>-fschedule-insns</samp>, but requests an additional pass of +instruction scheduling after register allocation has been done. This is +especially useful on machines with a relatively small number of +registers and where memory load instructions take more than one cycle. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-fno_002dsched_002dinterblock"></a> +<a name="index-fsched_002dinterblock"></a> +</dd> +<dt><code>-fno-sched-interblock</code></dt> +<dd><p>Disable instruction scheduling across basic blocks, which +is normally enabled when scheduling before register allocation, i.e. +with <samp>-fschedule-insns</samp> or at <samp>-O2</samp> or higher. +</p> +<a name="index-fno_002dsched_002dspec"></a> +<a name="index-fsched_002dspec"></a> +</dd> +<dt><code>-fno-sched-spec</code></dt> +<dd><p>Disable speculative motion of non-load instructions, which +is normally enabled when scheduling before register allocation, i.e. +with <samp>-fschedule-insns</samp> or at <samp>-O2</samp> or higher. +</p> +<a name="index-fsched_002dpressure"></a> +</dd> +<dt><code>-fsched-pressure</code></dt> +<dd><p>Enable register pressure sensitive insn scheduling before register +allocation. This only makes sense when scheduling before register +allocation is enabled, i.e. with <samp>-fschedule-insns</samp> or at +<samp>-O2</samp> or higher. Usage of this option can improve the +generated code and decrease its size by preventing register pressure +increase above the number of available hard registers and subsequent +spills in register allocation. +</p> +<a name="index-fsched_002dspec_002dload"></a> +</dd> +<dt><code>-fsched-spec-load</code></dt> +<dd><p>Allow speculative motion of some load instructions. This only makes +sense when scheduling before register allocation, i.e. with +<samp>-fschedule-insns</samp> or at <samp>-O2</samp> or higher. +</p> +<a name="index-fsched_002dspec_002dload_002ddangerous"></a> +</dd> +<dt><code>-fsched-spec-load-dangerous</code></dt> +<dd><p>Allow speculative motion of more load instructions. This only makes +sense when scheduling before register allocation, i.e. with +<samp>-fschedule-insns</samp> or at <samp>-O2</samp> or higher. +</p> +<a name="index-fsched_002dstalled_002dinsns"></a> +</dd> +<dt><code>-fsched-stalled-insns</code></dt> +<dt><code>-fsched-stalled-insns=<var>n</var></code></dt> +<dd><p>Define how many insns (if any) can be moved prematurely from the queue +of stalled insns into the ready list during the second scheduling pass. +<samp>-fno-sched-stalled-insns</samp> means that no insns are moved +prematurely, <samp>-fsched-stalled-insns=0</samp> means there is no limit +on how many queued insns can be moved prematurely. +<samp>-fsched-stalled-insns</samp> without a value is equivalent to +<samp>-fsched-stalled-insns=1</samp>. +</p> +<a name="index-fsched_002dstalled_002dinsns_002ddep"></a> +</dd> +<dt><code>-fsched-stalled-insns-dep</code></dt> +<dt><code>-fsched-stalled-insns-dep=<var>n</var></code></dt> +<dd><p>Define how many insn groups (cycles) are examined for a dependency +on a stalled insn that is a candidate for premature removal from the queue +of stalled insns. This has an effect only during the second scheduling pass, +and only if <samp>-fsched-stalled-insns</samp> is used. +<samp>-fno-sched-stalled-insns-dep</samp> is equivalent to +<samp>-fsched-stalled-insns-dep=0</samp>. +<samp>-fsched-stalled-insns-dep</samp> without a value is equivalent to +<samp>-fsched-stalled-insns-dep=1</samp>. +</p> +<a name="index-fsched2_002duse_002dsuperblocks"></a> +</dd> +<dt><code>-fsched2-use-superblocks</code></dt> +<dd><p>When scheduling after register allocation, use superblock scheduling. +This allows motion across basic block boundaries, +resulting in faster schedules. This option is experimental, as not all machine +descriptions used by GCC model the CPU closely enough to avoid unreliable +results from the algorithm. +</p> +<p>This only makes sense when scheduling after register allocation, i.e. with +<samp>-fschedule-insns2</samp> or at <samp>-O2</samp> or higher. +</p> +<a name="index-fsched_002dgroup_002dheuristic"></a> +</dd> +<dt><code>-fsched-group-heuristic</code></dt> +<dd><p>Enable the group heuristic in the scheduler. This heuristic favors +the instruction that belongs to a schedule group. This is enabled +by default when scheduling is enabled, i.e. with <samp>-fschedule-insns</samp> +or <samp>-fschedule-insns2</samp> or at <samp>-O2</samp> or higher. +</p> +<a name="index-fsched_002dcritical_002dpath_002dheuristic"></a> +</dd> +<dt><code>-fsched-critical-path-heuristic</code></dt> +<dd><p>Enable the critical-path heuristic in the scheduler. This heuristic favors +instructions on the critical path. This is enabled by default when +scheduling is enabled, i.e. with <samp>-fschedule-insns</samp> +or <samp>-fschedule-insns2</samp> or at <samp>-O2</samp> or higher. +</p> +<a name="index-fsched_002dspec_002dinsn_002dheuristic"></a> +</dd> +<dt><code>-fsched-spec-insn-heuristic</code></dt> +<dd><p>Enable the speculative instruction heuristic in the scheduler. This +heuristic favors speculative instructions with greater dependency weakness. +This is enabled by default when scheduling is enabled, i.e. +with <samp>-fschedule-insns</samp> or <samp>-fschedule-insns2</samp> +or at <samp>-O2</samp> or higher. +</p> +<a name="index-fsched_002drank_002dheuristic"></a> +</dd> +<dt><code>-fsched-rank-heuristic</code></dt> +<dd><p>Enable the rank heuristic in the scheduler. This heuristic favors +the instruction belonging to a basic block with greater size or frequency. +This is enabled by default when scheduling is enabled, i.e. +with <samp>-fschedule-insns</samp> or <samp>-fschedule-insns2</samp> or +at <samp>-O2</samp> or higher. +</p> +<a name="index-fsched_002dlast_002dinsn_002dheuristic"></a> +</dd> +<dt><code>-fsched-last-insn-heuristic</code></dt> +<dd><p>Enable the last-instruction heuristic in the scheduler. This heuristic +favors the instruction that is less dependent on the last instruction +scheduled. This is enabled by default when scheduling is enabled, +i.e. with <samp>-fschedule-insns</samp> or <samp>-fschedule-insns2</samp> or +at <samp>-O2</samp> or higher. +</p> +<a name="index-fsched_002ddep_002dcount_002dheuristic"></a> +</dd> +<dt><code>-fsched-dep-count-heuristic</code></dt> +<dd><p>Enable the dependent-count heuristic in the scheduler. This heuristic +favors the instruction that has more instructions depending on it. +This is enabled by default when scheduling is enabled, i.e. +with <samp>-fschedule-insns</samp> or <samp>-fschedule-insns2</samp> or +at <samp>-O2</samp> or higher. +</p> +<a name="index-freschedule_002dmodulo_002dscheduled_002dloops"></a> +</dd> +<dt><code>-freschedule-modulo-scheduled-loops</code></dt> +<dd><p>Modulo scheduling is performed before traditional scheduling. If a loop +is modulo scheduled, later scheduling passes may change its schedule. +Use this option to control that behavior. +</p> +<a name="index-fselective_002dscheduling"></a> +</dd> +<dt><code>-fselective-scheduling</code></dt> +<dd><p>Schedule instructions using selective scheduling algorithm. Selective +scheduling runs instead of the first scheduler pass. +</p> +<a name="index-fselective_002dscheduling2"></a> +</dd> +<dt><code>-fselective-scheduling2</code></dt> +<dd><p>Schedule instructions using selective scheduling algorithm. Selective +scheduling runs instead of the second scheduler pass. +</p> +<a name="index-fsel_002dsched_002dpipelining"></a> +</dd> +<dt><code>-fsel-sched-pipelining</code></dt> +<dd><p>Enable software pipelining of innermost loops during selective scheduling. +This option has no effect unless one of <samp>-fselective-scheduling</samp> or +<samp>-fselective-scheduling2</samp> is turned on. +</p> +<a name="index-fsel_002dsched_002dpipelining_002douter_002dloops"></a> +</dd> +<dt><code>-fsel-sched-pipelining-outer-loops</code></dt> +<dd><p>When pipelining loops during selective scheduling, also pipeline outer loops. +This option has no effect unless <samp>-fsel-sched-pipelining</samp> is turned on. +</p> +<a name="index-fsemantic_002dinterposition"></a> +</dd> +<dt><code>-fsemantic-interposition</code></dt> +<dd><p>Some object formats, like ELF, allow interposing of symbols by the +dynamic linker. +This means that for symbols exported from the DSO, the compiler cannot perform +interprocedural propagation, inlining and other optimizations in anticipation +that the function or variable in question may change. While this feature is +useful, for example, to rewrite memory allocation functions by a debugging +implementation, it is expensive in the terms of code quality. +With <samp>-fno-semantic-interposition</samp> the compiler assumes that +if interposition happens for functions the overwriting function will have +precisely the same semantics (and side effects). +Similarly if interposition happens +for variables, the constructor of the variable will be the same. The flag +has no effect for functions explicitly declared inline +(where it is never allowed for interposition to change semantics) +and for symbols explicitly declared weak. +</p> +<a name="index-fshrink_002dwrap"></a> +</dd> +<dt><code>-fshrink-wrap</code></dt> +<dd><p>Emit function prologues only before parts of the function that need it, +rather than at the top of the function. This flag is enabled by default at +<samp>-O</samp> and higher. +</p> +<a name="index-fshrink_002dwrap_002dseparate"></a> +</dd> +<dt><code>-fshrink-wrap-separate</code></dt> +<dd><p>Shrink-wrap separate parts of the prologue and epilogue separately, so that +those parts are only executed when needed. +This option is on by default, but has no effect unless <samp>-fshrink-wrap</samp> +is also turned on and the target supports this. +</p> +<a name="index-fcaller_002dsaves"></a> +</dd> +<dt><code>-fcaller-saves</code></dt> +<dd><p>Enable allocation of values to registers that are clobbered by +function calls, by emitting extra instructions to save and restore the +registers around such calls. Such allocation is done only when it +seems to result in better code. +</p> +<p>This option is always enabled by default on certain machines, usually +those which have no call-preserved registers to use instead. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-fcombine_002dstack_002dadjustments"></a> +</dd> +<dt><code>-fcombine-stack-adjustments</code></dt> +<dd><p>Tracks stack adjustments (pushes and pops) and stack memory references +and then tries to find ways to combine them. +</p> +<p>Enabled by default at <samp>-O1</samp> and higher. +</p> +<a name="index-fipa_002dra"></a> +</dd> +<dt><code>-fipa-ra</code></dt> +<dd><p>Use caller save registers for allocation if those registers are not used by +any called function. In that case it is not necessary to save and restore +them around calls. This is only possible if called functions are part of +same compilation unit as current function and they are compiled before it. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>, however the option +is disabled if generated code will be instrumented for profiling +(<samp>-p</samp>, or <samp>-pg</samp>) or if callee’s register usage cannot be known +exactly (this happens on targets that do not expose prologues +and epilogues in RTL). +</p> +<a name="index-fconserve_002dstack"></a> +</dd> +<dt><code>-fconserve-stack</code></dt> +<dd><p>Attempt to minimize stack usage. The compiler attempts to use less +stack space, even if that makes the program slower. This option +implies setting the <samp>large-stack-frame</samp> parameter to 100 +and the <samp>large-stack-frame-growth</samp> parameter to 400. +</p> +<a name="index-ftree_002dreassoc"></a> +</dd> +<dt><code>-ftree-reassoc</code></dt> +<dd><p>Perform reassociation on trees. This flag is enabled by default +at <samp>-O1</samp> and higher. +</p> +<a name="index-fcode_002dhoisting"></a> +</dd> +<dt><code>-fcode-hoisting</code></dt> +<dd><p>Perform code hoisting. Code hoisting tries to move the +evaluation of expressions executed on all paths to the function exit +as early as possible. This is especially useful as a code size +optimization, but it often helps for code speed as well. +This flag is enabled by default at <samp>-O2</samp> and higher. +</p> +<a name="index-ftree_002dpre"></a> +</dd> +<dt><code>-ftree-pre</code></dt> +<dd><p>Perform partial redundancy elimination (PRE) on trees. This flag is +enabled by default at <samp>-O2</samp> and <samp>-O3</samp>. +</p> +<a name="index-ftree_002dpartial_002dpre"></a> +</dd> +<dt><code>-ftree-partial-pre</code></dt> +<dd><p>Make partial redundancy elimination (PRE) more aggressive. This flag is +enabled by default at <samp>-O3</samp>. +</p> +<a name="index-ftree_002dforwprop"></a> +</dd> +<dt><code>-ftree-forwprop</code></dt> +<dd><p>Perform forward propagation on trees. This flag is enabled by default +at <samp>-O1</samp> and higher. +</p> +<a name="index-ftree_002dfre"></a> +</dd> +<dt><code>-ftree-fre</code></dt> +<dd><p>Perform full redundancy elimination (FRE) on trees. The difference +between FRE and PRE is that FRE only considers expressions +that are computed on all paths leading to the redundant computation. +This analysis is faster than PRE, though it exposes fewer redundancies. +This flag is enabled by default at <samp>-O1</samp> and higher. +</p> +<a name="index-ftree_002dphiprop"></a> +</dd> +<dt><code>-ftree-phiprop</code></dt> +<dd><p>Perform hoisting of loads from conditional pointers on trees. This +pass is enabled by default at <samp>-O1</samp> and higher. +</p> +<a name="index-fhoist_002dadjacent_002dloads"></a> +</dd> +<dt><code>-fhoist-adjacent-loads</code></dt> +<dd><p>Speculatively hoist loads from both branches of an if-then-else if the +loads are from adjacent locations in the same structure and the target +architecture has a conditional move instruction. This flag is enabled +by default at <samp>-O2</samp> and higher. +</p> +<a name="index-ftree_002dcopy_002dprop"></a> +</dd> +<dt><code>-ftree-copy-prop</code></dt> +<dd><p>Perform copy propagation on trees. This pass eliminates unnecessary +copy operations. This flag is enabled by default at <samp>-O1</samp> and +higher. +</p> +<a name="index-fipa_002dpure_002dconst"></a> +</dd> +<dt><code>-fipa-pure-const</code></dt> +<dd><p>Discover which functions are pure or constant. +Enabled by default at <samp>-O1</samp> and higher. +</p> +<a name="index-fipa_002dreference"></a> +</dd> +<dt><code>-fipa-reference</code></dt> +<dd><p>Discover which static variables do not escape the +compilation unit. +Enabled by default at <samp>-O1</samp> and higher. +</p> +<a name="index-fipa_002dreference_002daddressable"></a> +</dd> +<dt><code>-fipa-reference-addressable</code></dt> +<dd><p>Discover read-only, write-only and non-addressable static variables. +Enabled by default at <samp>-O1</samp> and higher. +</p> +<a name="index-fipa_002dstack_002dalignment"></a> +</dd> +<dt><code>-fipa-stack-alignment</code></dt> +<dd><p>Reduce stack alignment on call sites if possible. +Enabled by default. +</p> +<a name="index-fipa_002dpta"></a> +</dd> +<dt><code>-fipa-pta</code></dt> +<dd><p>Perform interprocedural pointer analysis and interprocedural modification +and reference analysis. This option can cause excessive memory and +compile-time usage on large compilation units. It is not enabled by +default at any optimization level. +</p> +<a name="index-fipa_002dprofile"></a> +</dd> +<dt><code>-fipa-profile</code></dt> +<dd><p>Perform interprocedural profile propagation. The functions called only from +cold functions are marked as cold. Also functions executed once (such as +<code>cold</code>, <code>noreturn</code>, static constructors or destructors) are +identified. Cold functions and loop less parts of functions executed once are +then optimized for size. +Enabled by default at <samp>-O1</samp> and higher. +</p> +<a name="index-fipa_002dmodref"></a> +</dd> +<dt><code>-fipa-modref</code></dt> +<dd><p>Perform interprocedural mod/ref analysis. This optimization analyzes the side +effects of functions (memory locations that are modified or referenced) and +enables better optimization across the function call boundary. This flag is +enabled by default at <samp>-O1</samp> and higher. +</p> +<a name="index-fipa_002dcp"></a> +</dd> +<dt><code>-fipa-cp</code></dt> +<dd><p>Perform interprocedural constant propagation. +This optimization analyzes the program to determine when values passed +to functions are constants and then optimizes accordingly. +This optimization can substantially increase performance +if the application has constants passed to functions. +This flag is enabled by default at <samp>-O2</samp>, <samp>-Os</samp> and <samp>-O3</samp>. +It is also enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>. +</p> +<a name="index-fipa_002dcp_002dclone"></a> +</dd> +<dt><code>-fipa-cp-clone</code></dt> +<dd><p>Perform function cloning to make interprocedural constant propagation stronger. +When enabled, interprocedural constant propagation performs function cloning +when externally visible function can be called with constant arguments. +Because this optimization can create multiple copies of functions, +it may significantly increase code size +(see <samp>--param ipa-cp-unit-growth=<var>value</var></samp>). +This flag is enabled by default at <samp>-O3</samp>. +It is also enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>. +</p> +<a name="index-fipa_002dbit_002dcp"></a> +</dd> +<dt><code>-fipa-bit-cp</code></dt> +<dd><p>When enabled, perform interprocedural bitwise constant +propagation. This flag is enabled by default at <samp>-O2</samp> and +by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>. +It requires that <samp>-fipa-cp</samp> is enabled. +</p> +<a name="index-fipa_002dvrp"></a> +</dd> +<dt><code>-fipa-vrp</code></dt> +<dd><p>When enabled, perform interprocedural propagation of value +ranges. This flag is enabled by default at <samp>-O2</samp>. It requires +that <samp>-fipa-cp</samp> is enabled. +</p> +<a name="index-fipa_002dicf"></a> +</dd> +<dt><code>-fipa-icf</code></dt> +<dd><p>Perform Identical Code Folding for functions and read-only variables. +The optimization reduces code size and may disturb unwind stacks by replacing +a function by equivalent one with a different name. The optimization works +more effectively with link-time optimization enabled. +</p> +<p>Although the behavior is similar to the Gold Linker’s ICF optimization, GCC ICF +works on different levels and thus the optimizations are not same - there are +equivalences that are found only by GCC and equivalences found only by Gold. +</p> +<p>This flag is enabled by default at <samp>-O2</samp> and <samp>-Os</samp>. +</p> +<a name="index-flive_002dpatching"></a> +</dd> +<dt><code>-flive-patching=<var>level</var></code></dt> +<dd><p>Control GCC’s optimizations to produce output suitable for live-patching. +</p> +<p>If the compiler’s optimization uses a function’s body or information extracted +from its body to optimize/change another function, the latter is called an +impacted function of the former. If a function is patched, its impacted +functions should be patched too. +</p> +<p>The impacted functions are determined by the compiler’s interprocedural +optimizations. For example, a caller is impacted when inlining a function +into its caller, +cloning a function and changing its caller to call this new clone, +or extracting a function’s pureness/constness information to optimize +its direct or indirect callers, etc. +</p> +<p>Usually, the more IPA optimizations enabled, the larger the number of +impacted functions for each function. In order to control the number of +impacted functions and more easily compute the list of impacted function, +IPA optimizations can be partially enabled at two different levels. +</p> +<p>The <var>level</var> argument should be one of the following: +</p> +<dl compact="compact"> +<dt>‘<samp>inline-clone</samp>’</dt> +<dd> +<p>Only enable inlining and cloning optimizations, which includes inlining, +cloning, interprocedural scalar replacement of aggregates and partial inlining. +As a result, when patching a function, all its callers and its clones’ +callers are impacted, therefore need to be patched as well. +</p> +<p><samp>-flive-patching=inline-clone</samp> disables the following optimization flags: +</p><div class="smallexample"> +<pre class="smallexample">-fwhole-program -fipa-pta -fipa-reference -fipa-ra +-fipa-icf -fipa-icf-functions -fipa-icf-variables +-fipa-bit-cp -fipa-vrp -fipa-pure-const +-fipa-reference-addressable +-fipa-stack-alignment -fipa-modref +</pre></div> + +</dd> +<dt>‘<samp>inline-only-static</samp>’</dt> +<dd> +<p>Only enable inlining of static functions. +As a result, when patching a static function, all its callers are impacted +and so need to be patched as well. +</p> +<p>In addition to all the flags that <samp>-flive-patching=inline-clone</samp> +disables, +<samp>-flive-patching=inline-only-static</samp> disables the following additional +optimization flags: +</p><div class="smallexample"> +<pre class="smallexample">-fipa-cp-clone -fipa-sra -fpartial-inlining -fipa-cp +</pre></div> + +</dd> +</dl> + +<p>When <samp>-flive-patching</samp> is specified without any value, the default value +is <var>inline-clone</var>. +</p> +<p>This flag is disabled by default. +</p> +<p>Note that <samp>-flive-patching</samp> is not supported with link-time optimization +(<samp>-flto</samp>). +</p> +<a name="index-fisolate_002derroneous_002dpaths_002ddereference"></a> +</dd> +<dt><code>-fisolate-erroneous-paths-dereference</code></dt> +<dd><p>Detect paths that trigger erroneous or undefined behavior due to +dereferencing a null pointer. Isolate those paths from the main control +flow and turn the statement with erroneous or undefined behavior into a trap. +This flag is enabled by default at <samp>-O2</samp> and higher and depends on +<samp>-fdelete-null-pointer-checks</samp> also being enabled. +</p> +<a name="index-fisolate_002derroneous_002dpaths_002dattribute"></a> +</dd> +<dt><code>-fisolate-erroneous-paths-attribute</code></dt> +<dd><p>Detect paths that trigger erroneous or undefined behavior due to a null value +being used in a way forbidden by a <code>returns_nonnull</code> or <code>nonnull</code> +attribute. Isolate those paths from the main control flow and turn the +statement with erroneous or undefined behavior into a trap. This is not +currently enabled, but may be enabled by <samp>-O2</samp> in the future. +</p> +<a name="index-ftree_002dsink"></a> +</dd> +<dt><code>-ftree-sink</code></dt> +<dd><p>Perform forward store motion on trees. This flag is +enabled by default at <samp>-O1</samp> and higher. +</p> +<a name="index-ftree_002dbit_002dccp"></a> +</dd> +<dt><code>-ftree-bit-ccp</code></dt> +<dd><p>Perform sparse conditional bit constant propagation on trees and propagate +pointer alignment information. +This pass only operates on local scalar variables and is enabled by default +at <samp>-O1</samp> and higher, except for <samp>-Og</samp>. +It requires that <samp>-ftree-ccp</samp> is enabled. +</p> +<a name="index-ftree_002dccp"></a> +</dd> +<dt><code>-ftree-ccp</code></dt> +<dd><p>Perform sparse conditional constant propagation (CCP) on trees. This +pass only operates on local scalar variables and is enabled by default +at <samp>-O1</samp> and higher. +</p> +<a name="index-fssa_002dbackprop"></a> +</dd> +<dt><code>-fssa-backprop</code></dt> +<dd><p>Propagate information about uses of a value up the definition chain +in order to simplify the definitions. For example, this pass strips +sign operations if the sign of a value never matters. The flag is +enabled by default at <samp>-O1</samp> and higher. +</p> +<a name="index-fssa_002dphiopt"></a> +</dd> +<dt><code>-fssa-phiopt</code></dt> +<dd><p>Perform pattern matching on SSA PHI nodes to optimize conditional +code. This pass is enabled by default at <samp>-O1</samp> and higher, +except for <samp>-Og</samp>. +</p> +<a name="index-ftree_002dswitch_002dconversion"></a> +</dd> +<dt><code>-ftree-switch-conversion</code></dt> +<dd><p>Perform conversion of simple initializations in a switch to +initializations from a scalar array. This flag is enabled by default +at <samp>-O2</samp> and higher. +</p> +<a name="index-ftree_002dtail_002dmerge"></a> +</dd> +<dt><code>-ftree-tail-merge</code></dt> +<dd><p>Look for identical code sequences. When found, replace one with a jump to the +other. This optimization is known as tail merging or cross jumping. This flag +is enabled by default at <samp>-O2</samp> and higher. The compilation time +in this pass can +be limited using <samp>max-tail-merge-comparisons</samp> parameter and +<samp>max-tail-merge-iterations</samp> parameter. +</p> +<a name="index-ftree_002ddce"></a> +</dd> +<dt><code>-ftree-dce</code></dt> +<dd><p>Perform dead code elimination (DCE) on trees. This flag is enabled by +default at <samp>-O1</samp> and higher. +</p> +<a name="index-ftree_002dbuiltin_002dcall_002ddce"></a> +</dd> +<dt><code>-ftree-builtin-call-dce</code></dt> +<dd><p>Perform conditional dead code elimination (DCE) for calls to built-in functions +that may set <code>errno</code> but are otherwise free of side effects. This flag is +enabled by default at <samp>-O2</samp> and higher if <samp>-Os</samp> is not also +specified. +</p> +<a name="index-ffinite_002dloops"></a> +<a name="index-fno_002dfinite_002dloops"></a> +</dd> +<dt><code>-ffinite-loops</code></dt> +<dd><p>Assume that a loop with an exit will eventually take the exit and not loop +indefinitely. This allows the compiler to remove loops that otherwise have +no side-effects, not considering eventual endless looping as such. +</p> +<p>This option is enabled by default at <samp>-O2</samp> for C++ with -std=c++11 +or higher. +</p> +<a name="index-ftree_002ddominator_002dopts"></a> +</dd> +<dt><code>-ftree-dominator-opts</code></dt> +<dd><p>Perform a variety of simple scalar cleanups (constant/copy +propagation, redundancy elimination, range propagation and expression +simplification) based on a dominator tree traversal. This also +performs jump threading (to reduce jumps to jumps). This flag is +enabled by default at <samp>-O1</samp> and higher. +</p> +<a name="index-ftree_002ddse"></a> +</dd> +<dt><code>-ftree-dse</code></dt> +<dd><p>Perform dead store elimination (DSE) on trees. A dead store is a store into +a memory location that is later overwritten by another store without +any intervening loads. In this case the earlier store can be deleted. This +flag is enabled by default at <samp>-O1</samp> and higher. +</p> +<a name="index-ftree_002dch"></a> +</dd> +<dt><code>-ftree-ch</code></dt> +<dd><p>Perform loop header copying on trees. This is beneficial since it increases +effectiveness of code motion optimizations. It also saves one jump. This flag +is enabled by default at <samp>-O1</samp> and higher. It is not enabled +for <samp>-Os</samp>, since it usually increases code size. +</p> +<a name="index-ftree_002dloop_002doptimize"></a> +</dd> +<dt><code>-ftree-loop-optimize</code></dt> +<dd><p>Perform loop optimizations on trees. This flag is enabled by default +at <samp>-O1</samp> and higher. +</p> +<a name="index-ftree_002dloop_002dlinear"></a> +<a name="index-floop_002dstrip_002dmine"></a> +<a name="index-floop_002dblock"></a> +</dd> +<dt><code>-ftree-loop-linear</code></dt> +<dt><code>-floop-strip-mine</code></dt> +<dt><code>-floop-block</code></dt> +<dd><p>Perform loop nest optimizations. Same as +<samp>-floop-nest-optimize</samp>. To use this code transformation, GCC has +to be configured with <samp>--with-isl</samp> to enable the Graphite loop +transformation infrastructure. +</p> +<a name="index-fgraphite_002didentity"></a> +</dd> +<dt><code>-fgraphite-identity</code></dt> +<dd><p>Enable the identity transformation for graphite. For every SCoP we generate +the polyhedral representation and transform it back to gimple. Using +<samp>-fgraphite-identity</samp> we can check the costs or benefits of the +GIMPLE -> GRAPHITE -> GIMPLE transformation. Some minimal optimizations +are also performed by the code generator isl, like index splitting and +dead code elimination in loops. +</p> +<a name="index-floop_002dnest_002doptimize"></a> +</dd> +<dt><code>-floop-nest-optimize</code></dt> +<dd><p>Enable the isl based loop nest optimizer. This is a generic loop nest +optimizer based on the Pluto optimization algorithms. It calculates a loop +structure optimized for data-locality and parallelism. This option +is experimental. +</p> +<a name="index-floop_002dparallelize_002dall"></a> +</dd> +<dt><code>-floop-parallelize-all</code></dt> +<dd><p>Use the Graphite data dependence analysis to identify loops that can +be parallelized. Parallelize all the loops that can be analyzed to +not contain loop carried dependences without checking that it is +profitable to parallelize the loops. +</p> +<a name="index-ftree_002dcoalesce_002dvars"></a> +</dd> +<dt><code>-ftree-coalesce-vars</code></dt> +<dd><p>While transforming the program out of the SSA representation, attempt to +reduce copying by coalescing versions of different user-defined +variables, instead of just compiler temporaries. This may severely +limit the ability to debug an optimized program compiled with +<samp>-fno-var-tracking-assignments</samp>. In the negated form, this flag +prevents SSA coalescing of user variables. This option is enabled by +default if optimization is enabled, and it does very little otherwise. +</p> +<a name="index-ftree_002dloop_002dif_002dconvert"></a> +</dd> +<dt><code>-ftree-loop-if-convert</code></dt> +<dd><p>Attempt to transform conditional jumps in the innermost loops to +branch-less equivalents. The intent is to remove control-flow from +the innermost loops in order to improve the ability of the +vectorization pass to handle these loops. This is enabled by default +if vectorization is enabled. +</p> +<a name="index-ftree_002dloop_002ddistribution"></a> +</dd> +<dt><code>-ftree-loop-distribution</code></dt> +<dd><p>Perform loop distribution. This flag can improve cache performance on +big loop bodies and allow further loop optimizations, like +parallelization or vectorization, to take place. For example, the loop +</p><div class="smallexample"> +<pre class="smallexample">DO I = 1, N + A(I) = B(I) + C + D(I) = E(I) * F +ENDDO +</pre></div> +<p>is transformed to +</p><div class="smallexample"> +<pre class="smallexample">DO I = 1, N + A(I) = B(I) + C +ENDDO +DO I = 1, N + D(I) = E(I) * F +ENDDO +</pre></div> +<p>This flag is enabled by default at <samp>-O3</samp>. +It is also enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>. +</p> +<a name="index-ftree_002dloop_002ddistribute_002dpatterns"></a> +</dd> +<dt><code>-ftree-loop-distribute-patterns</code></dt> +<dd><p>Perform loop distribution of patterns that can be code generated with +calls to a library. This flag is enabled by default at <samp>-O2</samp> and +higher, and by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>. +</p> +<p>This pass distributes the initialization loops and generates a call to +memset zero. For example, the loop +</p><div class="smallexample"> +<pre class="smallexample">DO I = 1, N + A(I) = 0 + B(I) = A(I) + I +ENDDO +</pre></div> +<p>is transformed to +</p><div class="smallexample"> +<pre class="smallexample">DO I = 1, N + A(I) = 0 +ENDDO +DO I = 1, N + B(I) = A(I) + I +ENDDO +</pre></div> +<p>and the initialization loop is transformed into a call to memset zero. +This flag is enabled by default at <samp>-O3</samp>. +It is also enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>. +</p> +<a name="index-floop_002dinterchange"></a> +</dd> +<dt><code>-floop-interchange</code></dt> +<dd><p>Perform loop interchange outside of graphite. This flag can improve cache +performance on loop nest and allow further loop optimizations, like +vectorization, to take place. For example, the loop +</p><div class="smallexample"> +<pre class="smallexample">for (int i = 0; i < N; i++) + for (int j = 0; j < N; j++) + for (int k = 0; k < N; k++) + c[i][j] = c[i][j] + a[i][k]*b[k][j]; +</pre></div> +<p>is transformed to +</p><div class="smallexample"> +<pre class="smallexample">for (int i = 0; i < N; i++) + for (int k = 0; k < N; k++) + for (int j = 0; j < N; j++) + c[i][j] = c[i][j] + a[i][k]*b[k][j]; +</pre></div> +<p>This flag is enabled by default at <samp>-O3</samp>. +It is also enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>. +</p> +<a name="index-floop_002dunroll_002dand_002djam"></a> +</dd> +<dt><code>-floop-unroll-and-jam</code></dt> +<dd><p>Apply unroll and jam transformations on feasible loops. In a loop +nest this unrolls the outer loop by some factor and fuses the resulting +multiple inner loops. This flag is enabled by default at <samp>-O3</samp>. +It is also enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>. +</p> +<a name="index-ftree_002dloop_002dim"></a> +</dd> +<dt><code>-ftree-loop-im</code></dt> +<dd><p>Perform loop invariant motion on trees. This pass moves only invariants that +are hard to handle at RTL level (function calls, operations that expand to +nontrivial sequences of insns). With <samp>-funswitch-loops</samp> it also moves +operands of conditions that are invariant out of the loop, so that we can use +just trivial invariantness analysis in loop unswitching. The pass also includes +store motion. +</p> +<a name="index-ftree_002dloop_002divcanon"></a> +</dd> +<dt><code>-ftree-loop-ivcanon</code></dt> +<dd><p>Create a canonical counter for number of iterations in loops for which +determining number of iterations requires complicated analysis. Later +optimizations then may determine the number easily. Useful especially +in connection with unrolling. +</p> +<a name="index-ftree_002dscev_002dcprop"></a> +</dd> +<dt><code>-ftree-scev-cprop</code></dt> +<dd><p>Perform final value replacement. If a variable is modified in a loop +in such a way that its value when exiting the loop can be determined using +only its initial value and the number of loop iterations, replace uses of +the final value by such a computation, provided it is sufficiently cheap. +This reduces data dependencies and may allow further simplifications. +Enabled by default at <samp>-O1</samp> and higher. +</p> +<a name="index-fivopts"></a> +</dd> +<dt><code>-fivopts</code></dt> +<dd><p>Perform induction variable optimizations (strength reduction, induction +variable merging and induction variable elimination) on trees. +</p> +<a name="index-ftree_002dparallelize_002dloops"></a> +</dd> +<dt><code>-ftree-parallelize-loops=n</code></dt> +<dd><p>Parallelize loops, i.e., split their iteration space to run in n threads. +This is only possible for loops whose iterations are independent +and can be arbitrarily reordered. The optimization is only +profitable on multiprocessor machines, for loops that are CPU-intensive, +rather than constrained e.g. by memory bandwidth. This option +implies <samp>-pthread</samp>, and thus is only supported on targets +that have support for <samp>-pthread</samp>. +</p> +<a name="index-ftree_002dpta"></a> +</dd> +<dt><code>-ftree-pta</code></dt> +<dd><p>Perform function-local points-to analysis on trees. This flag is +enabled by default at <samp>-O1</samp> and higher, except for <samp>-Og</samp>. +</p> +<a name="index-ftree_002dsra"></a> +</dd> +<dt><code>-ftree-sra</code></dt> +<dd><p>Perform scalar replacement of aggregates. This pass replaces structure +references with scalars to prevent committing structures to memory too +early. This flag is enabled by default at <samp>-O1</samp> and higher, +except for <samp>-Og</samp>. +</p> +<a name="index-fstore_002dmerging"></a> +</dd> +<dt><code>-fstore-merging</code></dt> +<dd><p>Perform merging of narrow stores to consecutive memory addresses. This pass +merges contiguous stores of immediate values narrower than a word into fewer +wider stores to reduce the number of instructions. This is enabled by default +at <samp>-O2</samp> and higher as well as <samp>-Os</samp>. +</p> +<a name="index-ftree_002dter"></a> +</dd> +<dt><code>-ftree-ter</code></dt> +<dd><p>Perform temporary expression replacement during the SSA->normal phase. Single +use/single def temporaries are replaced at their use location with their +defining expression. This results in non-GIMPLE code, but gives the expanders +much more complex trees to work on resulting in better RTL generation. This is +enabled by default at <samp>-O1</samp> and higher. +</p> +<a name="index-ftree_002dslsr"></a> +</dd> +<dt><code>-ftree-slsr</code></dt> +<dd><p>Perform straight-line strength reduction on trees. This recognizes related +expressions involving multiplications and replaces them by less expensive +calculations when possible. This is enabled by default at <samp>-O1</samp> and +higher. +</p> +<a name="index-ftree_002dvectorize"></a> +</dd> +<dt><code>-ftree-vectorize</code></dt> +<dd><p>Perform vectorization on trees. This flag enables <samp>-ftree-loop-vectorize</samp> +and <samp>-ftree-slp-vectorize</samp> if not explicitly specified. +</p> +<a name="index-ftree_002dloop_002dvectorize"></a> +</dd> +<dt><code>-ftree-loop-vectorize</code></dt> +<dd><p>Perform loop vectorization on trees. This flag is enabled by default at +<samp>-O2</samp> and by <samp>-ftree-vectorize</samp>, <samp>-fprofile-use</samp>, +and <samp>-fauto-profile</samp>. +</p> +<a name="index-ftree_002dslp_002dvectorize"></a> +</dd> +<dt><code>-ftree-slp-vectorize</code></dt> +<dd><p>Perform basic block vectorization on trees. This flag is enabled by default at +<samp>-O2</samp> and by <samp>-ftree-vectorize</samp>, <samp>-fprofile-use</samp>, +and <samp>-fauto-profile</samp>. +</p> +<a name="index-ftrivial_002dauto_002dvar_002dinit"></a> +</dd> +<dt><code>-ftrivial-auto-var-init=<var>choice</var></code></dt> +<dd><p>Initialize automatic variables with either a pattern or with zeroes to increase +the security and predictability of a program by preventing uninitialized memory +disclosure and use. +GCC still considers an automatic variable that doesn’t have an explicit +initializer as uninitialized, <samp>-Wuninitialized</samp> and +<samp>-Wanalyzer-use-of-uninitialized-value</samp> will still report +warning messages on such automatic variables and the compiler will +perform optimization as if the variable were uninitialized. +With this option, GCC will also initialize any padding of automatic variables +that have structure or union types to zeroes. +However, the current implementation cannot initialize automatic variables that +are declared between the controlling expression and the first case of a +<code>switch</code> statement. Using <samp>-Wtrivial-auto-var-init</samp> to report all +such cases. +</p> +<p>The three values of <var>choice</var> are: +</p> +<ul> +<li> ‘<samp>uninitialized</samp>’ doesn’t initialize any automatic variables. +This is C and C++’s default. + +</li><li> ‘<samp>pattern</samp>’ Initialize automatic variables with values which will likely +transform logic bugs into crashes down the line, are easily recognized in a +crash dump and without being values that programmers can rely on for useful +program semantics. +The current value is byte-repeatable pattern with byte "0xFE". +The values used for pattern initialization might be changed in the future. + +</li><li> ‘<samp>zero</samp>’ Initialize automatic variables with zeroes. +</li></ul> + +<p>The default is ‘<samp>uninitialized</samp>’. +</p> +<p>You can control this behavior for a specific variable by using the variable +attribute <code>uninitialized</code> (see <a href="Variable-Attributes.html#Variable-Attributes">Variable Attributes</a>). +</p> +<a name="index-fvect_002dcost_002dmodel"></a> +</dd> +<dt><code>-fvect-cost-model=<var>model</var></code></dt> +<dd><p>Alter the cost model used for vectorization. The <var>model</var> argument +should be one of ‘<samp>unlimited</samp>’, ‘<samp>dynamic</samp>’, ‘<samp>cheap</samp>’ or +‘<samp>very-cheap</samp>’. +With the ‘<samp>unlimited</samp>’ model the vectorized code-path is assumed +to be profitable while with the ‘<samp>dynamic</samp>’ model a runtime check +guards the vectorized code-path to enable it only for iteration +counts that will likely execute faster than when executing the original +scalar loop. The ‘<samp>cheap</samp>’ model disables vectorization of +loops where doing so would be cost prohibitive for example due to +required runtime checks for data dependence or alignment but otherwise +is equal to the ‘<samp>dynamic</samp>’ model. The ‘<samp>very-cheap</samp>’ model only +allows vectorization if the vector code would entirely replace the +scalar code that is being vectorized. For example, if each iteration +of a vectorized loop would only be able to handle exactly four iterations +of the scalar loop, the ‘<samp>very-cheap</samp>’ model would only allow +vectorization if the scalar iteration count is known to be a multiple +of four. +</p> +<p>The default cost model depends on other optimization flags and is +either ‘<samp>dynamic</samp>’ or ‘<samp>cheap</samp>’. +</p> +<a name="index-fsimd_002dcost_002dmodel"></a> +</dd> +<dt><code>-fsimd-cost-model=<var>model</var></code></dt> +<dd><p>Alter the cost model used for vectorization of loops marked with the OpenMP +simd directive. The <var>model</var> argument should be one of +‘<samp>unlimited</samp>’, ‘<samp>dynamic</samp>’, ‘<samp>cheap</samp>’. All values of <var>model</var> +have the same meaning as described in <samp>-fvect-cost-model</samp> and by +default a cost model defined with <samp>-fvect-cost-model</samp> is used. +</p> +<a name="index-ftree_002dvrp"></a> +</dd> +<dt><code>-ftree-vrp</code></dt> +<dd><p>Perform Value Range Propagation on trees. This is similar to the +constant propagation pass, but instead of values, ranges of values are +propagated. This allows the optimizers to remove unnecessary range +checks like array bound checks and null pointer checks. This is +enabled by default at <samp>-O2</samp> and higher. Null pointer check +elimination is only done if <samp>-fdelete-null-pointer-checks</samp> is +enabled. +</p> +<a name="index-fsplit_002dpaths"></a> +</dd> +<dt><code>-fsplit-paths</code></dt> +<dd><p>Split paths leading to loop backedges. This can improve dead code +elimination and common subexpression elimination. This is enabled by +default at <samp>-O3</samp> and above. +</p> +<a name="index-fsplit_002divs_002din_002dunroller"></a> +</dd> +<dt><code>-fsplit-ivs-in-unroller</code></dt> +<dd><p>Enables expression of values of induction variables in later iterations +of the unrolled loop using the value in the first iteration. This breaks +long dependency chains, thus improving efficiency of the scheduling passes. +</p> +<p>A combination of <samp>-fweb</samp> and CSE is often sufficient to obtain the +same effect. However, that is not reliable in cases where the loop body +is more complicated than a single basic block. It also does not work at all +on some architectures due to restrictions in the CSE pass. +</p> +<p>This optimization is enabled by default. +</p> +<a name="index-fvariable_002dexpansion_002din_002dunroller"></a> +</dd> +<dt><code>-fvariable-expansion-in-unroller</code></dt> +<dd><p>With this option, the compiler creates multiple copies of some +local variables when unrolling a loop, which can result in superior code. +</p> +<p>This optimization is enabled by default for PowerPC targets, but disabled +by default otherwise. +</p> +<a name="index-fpartial_002dinlining"></a> +</dd> +<dt><code>-fpartial-inlining</code></dt> +<dd><p>Inline parts of functions. This option has any effect only +when inlining itself is turned on by the <samp>-finline-functions</samp> +or <samp>-finline-small-functions</samp> options. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-fpredictive_002dcommoning"></a> +</dd> +<dt><code>-fpredictive-commoning</code></dt> +<dd><p>Perform predictive commoning optimization, i.e., reusing computations +(especially memory loads and stores) performed in previous +iterations of loops. +</p> +<p>This option is enabled at level <samp>-O3</samp>. +It is also enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>. +</p> +<a name="index-fprefetch_002dloop_002darrays"></a> +</dd> +<dt><code>-fprefetch-loop-arrays</code></dt> +<dd><p>If supported by the target machine, generate instructions to prefetch +memory to improve the performance of loops that access large arrays. +</p> +<p>This option may generate better or worse code; results are highly +dependent on the structure of loops within the source code. +</p> +<p>Disabled at level <samp>-Os</samp>. +</p> +<a name="index-fno_002dprintf_002dreturn_002dvalue"></a> +<a name="index-fprintf_002dreturn_002dvalue"></a> +</dd> +<dt><code>-fno-printf-return-value</code></dt> +<dd><p>Do not substitute constants for known return value of formatted output +functions such as <code>sprintf</code>, <code>snprintf</code>, <code>vsprintf</code>, and +<code>vsnprintf</code> (but not <code>printf</code> of <code>fprintf</code>). This +transformation allows GCC to optimize or even eliminate branches based +on the known return value of these functions called with arguments that +are either constant, or whose values are known to be in a range that +makes determining the exact return value possible. For example, when +<samp>-fprintf-return-value</samp> is in effect, both the branch and the +body of the <code>if</code> statement (but not the call to <code>snprint</code>) +can be optimized away when <code>i</code> is a 32-bit or smaller integer +because the return value is guaranteed to be at most 8. +</p> +<div class="smallexample"> +<pre class="smallexample">char buf[9]; +if (snprintf (buf, "%08x", i) >= sizeof buf) + … +</pre></div> + +<p>The <samp>-fprintf-return-value</samp> option relies on other optimizations +and yields best results with <samp>-O2</samp> and above. It works in tandem +with the <samp>-Wformat-overflow</samp> and <samp>-Wformat-truncation</samp> +options. The <samp>-fprintf-return-value</samp> option is enabled by default. +</p> +<a name="index-fno_002dpeephole"></a> +<a name="index-fpeephole"></a> +<a name="index-fno_002dpeephole2"></a> +<a name="index-fpeephole2"></a> +</dd> +<dt><code>-fno-peephole</code></dt> +<dt><code>-fno-peephole2</code></dt> +<dd><p>Disable any machine-specific peephole optimizations. The difference +between <samp>-fno-peephole</samp> and <samp>-fno-peephole2</samp> is in how they +are implemented in the compiler; some targets use one, some use the +other, a few use both. +</p> +<p><samp>-fpeephole</samp> is enabled by default. +<samp>-fpeephole2</samp> enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-fno_002dguess_002dbranch_002dprobability"></a> +<a name="index-fguess_002dbranch_002dprobability"></a> +</dd> +<dt><code>-fno-guess-branch-probability</code></dt> +<dd><p>Do not guess branch probabilities using heuristics. +</p> +<p>GCC uses heuristics to guess branch probabilities if they are +not provided by profiling feedback (<samp>-fprofile-arcs</samp>). These +heuristics are based on the control flow graph. If some branch probabilities +are specified by <code>__builtin_expect</code>, then the heuristics are +used to guess branch probabilities for the rest of the control flow graph, +taking the <code>__builtin_expect</code> info into account. The interactions +between the heuristics and <code>__builtin_expect</code> can be complex, and in +some cases, it may be useful to disable the heuristics so that the effects +of <code>__builtin_expect</code> are easier to understand. +</p> +<p>It is also possible to specify expected probability of the expression +with <code>__builtin_expect_with_probability</code> built-in function. +</p> +<p>The default is <samp>-fguess-branch-probability</samp> at levels +<samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-freorder_002dblocks"></a> +</dd> +<dt><code>-freorder-blocks</code></dt> +<dd><p>Reorder basic blocks in the compiled function in order to reduce number of +taken branches and improve code locality. +</p> +<p>Enabled at levels <samp>-O1</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-freorder_002dblocks_002dalgorithm"></a> +</dd> +<dt><code>-freorder-blocks-algorithm=<var>algorithm</var></code></dt> +<dd><p>Use the specified algorithm for basic block reordering. The +<var>algorithm</var> argument can be ‘<samp>simple</samp>’, which does not increase +code size (except sometimes due to secondary effects like alignment), +or ‘<samp>stc</samp>’, the “software trace cache” algorithm, which tries to +put all often executed code together, minimizing the number of branches +executed by making extra copies of code. +</p> +<p>The default is ‘<samp>simple</samp>’ at levels <samp>-O1</samp>, <samp>-Os</samp>, and +‘<samp>stc</samp>’ at levels <samp>-O2</samp>, <samp>-O3</samp>. +</p> +<a name="index-freorder_002dblocks_002dand_002dpartition"></a> +</dd> +<dt><code>-freorder-blocks-and-partition</code></dt> +<dd><p>In addition to reordering basic blocks in the compiled function, in order +to reduce number of taken branches, partitions hot and cold basic blocks +into separate sections of the assembly and <samp>.o</samp> files, to improve +paging and cache locality performance. +</p> +<p>This optimization is automatically turned off in the presence of +exception handling or unwind tables (on targets using setjump/longjump or target specific scheme), for linkonce sections, for functions with a user-defined +section attribute and on any architecture that does not support named +sections. When <samp>-fsplit-stack</samp> is used this option is not +enabled by default (to avoid linker errors), but may be enabled +explicitly (if using a working linker). +</p> +<p>Enabled for x86 at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-freorder_002dfunctions"></a> +</dd> +<dt><code>-freorder-functions</code></dt> +<dd><p>Reorder functions in the object file in order to +improve code locality. This is implemented by using special +subsections <code>.text.hot</code> for most frequently executed functions and +<code>.text.unlikely</code> for unlikely executed functions. Reordering is done by +the linker so object file format must support named sections and linker must +place them in a reasonable way. +</p> +<p>This option isn’t effective unless you either provide profile feedback +(see <samp>-fprofile-arcs</samp> for details) or manually annotate functions with +<code>hot</code> or <code>cold</code> attributes (see <a href="Common-Function-Attributes.html#Common-Function-Attributes">Common Function Attributes</a>). +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-fstrict_002daliasing"></a> +</dd> +<dt><code>-fstrict-aliasing</code></dt> +<dd><p>Allow the compiler to assume the strictest aliasing rules applicable to +the language being compiled. For C (and C++), this activates +optimizations based on the type of expressions. In particular, an +object of one type is assumed never to reside at the same address as an +object of a different type, unless the types are almost the same. For +example, an <code>unsigned int</code> can alias an <code>int</code>, but not a +<code>void*</code> or a <code>double</code>. A character type may alias any other +type. +</p> +<a name="Type_002dpunning"></a><p>Pay special attention to code like this: +</p><div class="smallexample"> +<pre class="smallexample">union a_union { + int i; + double d; +}; + +int f() { + union a_union t; + t.d = 3.0; + return t.i; +} +</pre></div> +<p>The practice of reading from a different union member than the one most +recently written to (called “type-punning”) is common. Even with +<samp>-fstrict-aliasing</samp>, type-punning is allowed, provided the memory +is accessed through the union type. So, the code above works as +expected. See <a href="Structures-unions-enumerations-and-bit_002dfields-implementation.html#Structures-unions-enumerations-and-bit_002dfields-implementation">Structures unions enumerations and bit-fields implementation</a>. However, this code might not: +</p><div class="smallexample"> +<pre class="smallexample">int f() { + union a_union t; + int* ip; + t.d = 3.0; + ip = &t.i; + return *ip; +} +</pre></div> + +<p>Similarly, access by taking the address, casting the resulting pointer +and dereferencing the result has undefined behavior, even if the cast +uses a union type, e.g.: +</p><div class="smallexample"> +<pre class="smallexample">int f() { + double d = 3.0; + return ((union a_union *) &d)->i; +} +</pre></div> + +<p>The <samp>-fstrict-aliasing</samp> option is enabled at levels +<samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-fipa_002dstrict_002daliasing"></a> +</dd> +<dt><code>-fipa-strict-aliasing</code></dt> +<dd><p>Controls whether rules of <samp>-fstrict-aliasing</samp> are applied across +function boundaries. Note that if multiple functions gets inlined into a +single function the memory accesses are no longer considered to be crossing a +function boundary. +</p> +<p>The <samp>-fipa-strict-aliasing</samp> option is enabled by default and is +effective only in combination with <samp>-fstrict-aliasing</samp>. +</p> +<a name="index-falign_002dfunctions"></a> +</dd> +<dt><code>-falign-functions</code></dt> +<dt><code>-falign-functions=<var>n</var></code></dt> +<dt><code>-falign-functions=<var>n</var>:<var>m</var></code></dt> +<dt><code>-falign-functions=<var>n</var>:<var>m</var>:<var>n2</var></code></dt> +<dt><code>-falign-functions=<var>n</var>:<var>m</var>:<var>n2</var>:<var>m2</var></code></dt> +<dd><p>Align the start of functions to the next power-of-two greater than or +equal to <var>n</var>, skipping up to <var>m</var>-1 bytes. This ensures that at +least the first <var>m</var> bytes of the function can be fetched by the CPU +without crossing an <var>n</var>-byte alignment boundary. +</p> +<p>If <var>m</var> is not specified, it defaults to <var>n</var>. +</p> +<p>Examples: <samp>-falign-functions=32</samp> aligns functions to the next +32-byte boundary, <samp>-falign-functions=24</samp> aligns to the next +32-byte boundary only if this can be done by skipping 23 bytes or less, +<samp>-falign-functions=32:7</samp> aligns to the next +32-byte boundary only if this can be done by skipping 6 bytes or less. +</p> +<p>The second pair of <var>n2</var>:<var>m2</var> values allows you to specify +a secondary alignment: <samp>-falign-functions=64:7:32:3</samp> aligns to +the next 64-byte boundary if this can be done by skipping 6 bytes or less, +otherwise aligns to the next 32-byte boundary if this can be done +by skipping 2 bytes or less. +If <var>m2</var> is not specified, it defaults to <var>n2</var>. +</p> +<p>Some assemblers only support this flag when <var>n</var> is a power of two; +in that case, it is rounded up. +</p> +<p><samp>-fno-align-functions</samp> and <samp>-falign-functions=1</samp> are +equivalent and mean that functions are not aligned. +</p> +<p>If <var>n</var> is not specified or is zero, use a machine-dependent default. +The maximum allowed <var>n</var> option value is 65536. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>. +</p> +</dd> +<dt><code>-flimit-function-alignment</code></dt> +<dd><p>If this option is enabled, the compiler tries to avoid unnecessarily +overaligning functions. It attempts to instruct the assembler to align +by the amount specified by <samp>-falign-functions</samp>, but not to +skip more bytes than the size of the function. +</p> +<a name="index-falign_002dlabels"></a> +</dd> +<dt><code>-falign-labels</code></dt> +<dt><code>-falign-labels=<var>n</var></code></dt> +<dt><code>-falign-labels=<var>n</var>:<var>m</var></code></dt> +<dt><code>-falign-labels=<var>n</var>:<var>m</var>:<var>n2</var></code></dt> +<dt><code>-falign-labels=<var>n</var>:<var>m</var>:<var>n2</var>:<var>m2</var></code></dt> +<dd><p>Align all branch targets to a power-of-two boundary. +</p> +<p>Parameters of this option are analogous to the <samp>-falign-functions</samp> option. +<samp>-fno-align-labels</samp> and <samp>-falign-labels=1</samp> are +equivalent and mean that labels are not aligned. +</p> +<p>If <samp>-falign-loops</samp> or <samp>-falign-jumps</samp> are applicable and +are greater than this value, then their values are used instead. +</p> +<p>If <var>n</var> is not specified or is zero, use a machine-dependent default +which is very likely to be ‘<samp>1</samp>’, meaning no alignment. +The maximum allowed <var>n</var> option value is 65536. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>. +</p> +<a name="index-falign_002dloops"></a> +</dd> +<dt><code>-falign-loops</code></dt> +<dt><code>-falign-loops=<var>n</var></code></dt> +<dt><code>-falign-loops=<var>n</var>:<var>m</var></code></dt> +<dt><code>-falign-loops=<var>n</var>:<var>m</var>:<var>n2</var></code></dt> +<dt><code>-falign-loops=<var>n</var>:<var>m</var>:<var>n2</var>:<var>m2</var></code></dt> +<dd><p>Align loops to a power-of-two boundary. If the loops are executed +many times, this makes up for any execution of the dummy padding +instructions. +</p> +<p>If <samp>-falign-labels</samp> is greater than this value, then its value +is used instead. +</p> +<p>Parameters of this option are analogous to the <samp>-falign-functions</samp> option. +<samp>-fno-align-loops</samp> and <samp>-falign-loops=1</samp> are +equivalent and mean that loops are not aligned. +The maximum allowed <var>n</var> option value is 65536. +</p> +<p>If <var>n</var> is not specified or is zero, use a machine-dependent default. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>. +</p> +<a name="index-falign_002djumps"></a> +</dd> +<dt><code>-falign-jumps</code></dt> +<dt><code>-falign-jumps=<var>n</var></code></dt> +<dt><code>-falign-jumps=<var>n</var>:<var>m</var></code></dt> +<dt><code>-falign-jumps=<var>n</var>:<var>m</var>:<var>n2</var></code></dt> +<dt><code>-falign-jumps=<var>n</var>:<var>m</var>:<var>n2</var>:<var>m2</var></code></dt> +<dd><p>Align branch targets to a power-of-two boundary, for branch targets +where the targets can only be reached by jumping. In this case, +no dummy operations need be executed. +</p> +<p>If <samp>-falign-labels</samp> is greater than this value, then its value +is used instead. +</p> +<p>Parameters of this option are analogous to the <samp>-falign-functions</samp> option. +<samp>-fno-align-jumps</samp> and <samp>-falign-jumps=1</samp> are +equivalent and mean that loops are not aligned. +</p> +<p>If <var>n</var> is not specified or is zero, use a machine-dependent default. +The maximum allowed <var>n</var> option value is 65536. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>. +</p> +<a name="index-fno_002dallocation_002ddce"></a> +</dd> +<dt><code>-fno-allocation-dce</code></dt> +<dd><p>Do not remove unused C++ allocations in dead code elimination. +</p> +<a name="index-fallow_002dstore_002ddata_002draces"></a> +</dd> +<dt><code>-fallow-store-data-races</code></dt> +<dd><p>Allow the compiler to perform optimizations that may introduce new data races +on stores, without proving that the variable cannot be concurrently accessed +by other threads. Does not affect optimization of local data. It is safe to +use this option if it is known that global data will not be accessed by +multiple threads. +</p> +<p>Examples of optimizations enabled by <samp>-fallow-store-data-races</samp> include +hoisting or if-conversions that may cause a value that was already in memory +to be re-written with that same value. Such re-writing is safe in a single +threaded context but may be unsafe in a multi-threaded context. Note that on +some processors, if-conversions may be required in order to enable +vectorization. +</p> +<p>Enabled at level <samp>-Ofast</samp>. +</p> +<a name="index-funit_002dat_002da_002dtime"></a> +</dd> +<dt><code>-funit-at-a-time</code></dt> +<dd><p>This option is left for compatibility reasons. <samp>-funit-at-a-time</samp> +has no effect, while <samp>-fno-unit-at-a-time</samp> implies +<samp>-fno-toplevel-reorder</samp> and <samp>-fno-section-anchors</samp>. +</p> +<p>Enabled by default. +</p> +<a name="index-fno_002dtoplevel_002dreorder"></a> +<a name="index-ftoplevel_002dreorder"></a> +</dd> +<dt><code>-fno-toplevel-reorder</code></dt> +<dd><p>Do not reorder top-level functions, variables, and <code>asm</code> +statements. Output them in the same order that they appear in the +input file. When this option is used, unreferenced static variables +are not removed. This option is intended to support existing code +that relies on a particular ordering. For new code, it is better to +use attributes when possible. +</p> +<p><samp>-ftoplevel-reorder</samp> is the default at <samp>-O1</samp> and higher, and +also at <samp>-O0</samp> if <samp>-fsection-anchors</samp> is explicitly requested. +Additionally <samp>-fno-toplevel-reorder</samp> implies +<samp>-fno-section-anchors</samp>. +</p> +<a name="index-funreachable_002dtraps"></a> +</dd> +<dt><code>-funreachable-traps</code></dt> +<dd><p>With this option, the compiler turns calls to +<code>__builtin_unreachable</code> into traps, instead of using them for +optimization. This also affects any such calls implicitly generated +by the compiler. +</p> +<p>This option has the same effect as <samp>-fsanitize=unreachable +-fsanitize-trap=unreachable</samp>, but does not affect the values of those +options. If <samp>-fsanitize=unreachable</samp> is enabled, that option +takes priority over this one. +</p> +<p>This option is enabled by default at <samp>-O0</samp> and <samp>-Og</samp>. +</p> +<a name="index-fweb"></a> +</dd> +<dt><code>-fweb</code></dt> +<dd><p>Constructs webs as commonly used for register allocation purposes and assign +each web individual pseudo register. This allows the register allocation pass +to operate on pseudos directly, but also strengthens several other optimization +passes, such as CSE, loop optimizer and trivial dead code remover. It can, +however, make debugging impossible, since variables no longer stay in a +“home register”. +</p> +<p>Enabled by default with <samp>-funroll-loops</samp>. +</p> +<a name="index-fwhole_002dprogram"></a> +</dd> +<dt><code>-fwhole-program</code></dt> +<dd><p>Assume that the current compilation unit represents the whole program being +compiled. All public functions and variables with the exception of <code>main</code> +and those merged by attribute <code>externally_visible</code> become static functions +and in effect are optimized more aggressively by interprocedural optimizers. +</p> +<p>With <samp>-flto</samp> this option has a limited use. In most cases the +precise list of symbols used or exported from the binary is known the +resolution info passed to the link-time optimizer by the linker plugin. It is +still useful if no linker plugin is used or during incremental link step when +final code is produced (with <samp>-flto</samp> +<samp>-flinker-output=nolto-rel</samp>). +</p> +<a name="index-flto"></a> +</dd> +<dt><code>-flto[=<var>n</var>]</code></dt> +<dd><p>This option runs the standard link-time optimizer. When invoked +with source code, it generates GIMPLE (one of GCC’s internal +representations) and writes it to special ELF sections in the object +file. When the object files are linked together, all the function +bodies are read from these ELF sections and instantiated as if they +had been part of the same translation unit. +</p> +<p>To use the link-time optimizer, <samp>-flto</samp> and optimization +options should be specified at compile time and during the final link. +It is recommended that you compile all the files participating in the +same link with the same options and also specify those options at +link time. +For example: +</p> +<div class="smallexample"> +<pre class="smallexample">gcc -c -O2 -flto foo.c +gcc -c -O2 -flto bar.c +gcc -o myprog -flto -O2 foo.o bar.o +</pre></div> + +<p>The first two invocations to GCC save a bytecode representation +of GIMPLE into special ELF sections inside <samp>foo.o</samp> and +<samp>bar.o</samp>. The final invocation reads the GIMPLE bytecode from +<samp>foo.o</samp> and <samp>bar.o</samp>, merges the two files into a single +internal image, and compiles the result as usual. Since both +<samp>foo.o</samp> and <samp>bar.o</samp> are merged into a single image, this +causes all the interprocedural analyses and optimizations in GCC to +work across the two files as if they were a single one. This means, +for example, that the inliner is able to inline functions in +<samp>bar.o</samp> into functions in <samp>foo.o</samp> and vice-versa. +</p> +<p>Another (simpler) way to enable link-time optimization is: +</p> +<div class="smallexample"> +<pre class="smallexample">gcc -o myprog -flto -O2 foo.c bar.c +</pre></div> + +<p>The above generates bytecode for <samp>foo.c</samp> and <samp>bar.c</samp>, +merges them together into a single GIMPLE representation and optimizes +them as usual to produce <samp>myprog</samp>. +</p> +<p>The important thing to keep in mind is that to enable link-time +optimizations you need to use the GCC driver to perform the link step. +GCC automatically performs link-time optimization if any of the +objects involved were compiled with the <samp>-flto</samp> command-line option. +You can always override +the automatic decision to do link-time optimization +by passing <samp>-fno-lto</samp> to the link command. +</p> +<p>To make whole program optimization effective, it is necessary to make +certain whole program assumptions. The compiler needs to know +what functions and variables can be accessed by libraries and runtime +outside of the link-time optimized unit. When supported by the linker, +the linker plugin (see <samp>-fuse-linker-plugin</samp>) passes information +to the compiler about used and externally visible symbols. When +the linker plugin is not available, <samp>-fwhole-program</samp> should be +used to allow the compiler to make these assumptions, which leads +to more aggressive optimization decisions. +</p> +<p>When a file is compiled with <samp>-flto</samp> without +<samp>-fuse-linker-plugin</samp>, the generated object file is larger than +a regular object file because it contains GIMPLE bytecodes and the usual +final code (see <samp>-ffat-lto-objects</samp>). This means that +object files with LTO information can be linked as normal object +files; if <samp>-fno-lto</samp> is passed to the linker, no +interprocedural optimizations are applied. Note that when +<samp>-fno-fat-lto-objects</samp> is enabled the compile stage is faster +but you cannot perform a regular, non-LTO link on them. +</p> +<p>When producing the final binary, GCC only +applies link-time optimizations to those files that contain bytecode. +Therefore, you can mix and match object files and libraries with +GIMPLE bytecodes and final object code. GCC automatically selects +which files to optimize in LTO mode and which files to link without +further processing. +</p> +<p>Generally, options specified at link time override those +specified at compile time, although in some cases GCC attempts to infer +link-time options from the settings used to compile the input files. +</p> +<p>If you do not specify an optimization level option <samp>-O</samp> at +link time, then GCC uses the highest optimization level +used when compiling the object files. Note that it is generally +ineffective to specify an optimization level option only at link time and +not at compile time, for two reasons. First, compiling without +optimization suppresses compiler passes that gather information +needed for effective optimization at link time. Second, some early +optimization passes can be performed only at compile time and +not at link time. +</p> +<p>There are some code generation flags preserved by GCC when +generating bytecodes, as they need to be used during the final link. +Currently, the following options and their settings are taken from +the first object file that explicitly specifies them: +<samp>-fcommon</samp>, <samp>-fexceptions</samp>, <samp>-fnon-call-exceptions</samp>, +<samp>-fgnu-tm</samp> and all the <samp>-m</samp> target flags. +</p> +<p>The following options <samp>-fPIC</samp>, <samp>-fpic</samp>, <samp>-fpie</samp> and +<samp>-fPIE</samp> are combined based on the following scheme: +</p> +<div class="smallexample"> +<pre class="smallexample"><samp>-fPIC</samp> + <samp>-fpic</samp> = <samp>-fpic</samp> +<samp>-fPIC</samp> + <samp>-fno-pic</samp> = <samp>-fno-pic</samp> +<samp>-fpic/-fPIC</samp> + (no option) = (no option) +<samp>-fPIC</samp> + <samp>-fPIE</samp> = <samp>-fPIE</samp> +<samp>-fpic</samp> + <samp>-fPIE</samp> = <samp>-fpie</samp> +<samp>-fPIC/-fpic</samp> + <samp>-fpie</samp> = <samp>-fpie</samp> +</pre></div> + +<p>Certain ABI-changing flags are required to match in all compilation units, +and trying to override this at link time with a conflicting value +is ignored. This includes options such as <samp>-freg-struct-return</samp> +and <samp>-fpcc-struct-return</samp>. +</p> +<p>Other options such as <samp>-ffp-contract</samp>, <samp>-fno-strict-overflow</samp>, +<samp>-fwrapv</samp>, <samp>-fno-trapv</samp> or <samp>-fno-strict-aliasing</samp> +are passed through to the link stage and merged conservatively for +conflicting translation units. Specifically +<samp>-fno-strict-overflow</samp>, <samp>-fwrapv</samp> and <samp>-fno-trapv</samp> take +precedence; and for example <samp>-ffp-contract=off</samp> takes precedence +over <samp>-ffp-contract=fast</samp>. You can override them at link time. +</p> +<p>Diagnostic options such as <samp>-Wstringop-overflow</samp> are passed +through to the link stage and their setting matches that of the +compile-step at function granularity. Note that this matters only +for diagnostics emitted during optimization. Note that code +transforms such as inlining can lead to warnings being enabled +or disabled for regions if code not consistent with the setting +at compile time. +</p> +<p>When you need to pass options to the assembler via <samp>-Wa</samp> or +<samp>-Xassembler</samp> make sure to either compile such translation +units with <samp>-fno-lto</samp> or consistently use the same assembler +options on all translation units. You can alternatively also +specify assembler options at LTO link time. +</p> +<p>To enable debug info generation you need to supply <samp>-g</samp> at +compile time. If any of the input files at link time were built +with debug info generation enabled the link will enable debug info +generation as well. Any elaborate debug info settings +like the dwarf level <samp>-gdwarf-5</samp> need to be explicitly repeated +at the linker command line and mixing different settings in different +translation units is discouraged. +</p> +<p>If LTO encounters objects with C linkage declared with incompatible +types in separate translation units to be linked together (undefined +behavior according to ISO C99 6.2.7), a non-fatal diagnostic may be +issued. The behavior is still undefined at run time. Similar +diagnostics may be raised for other languages. +</p> +<p>Another feature of LTO is that it is possible to apply interprocedural +optimizations on files written in different languages: +</p> +<div class="smallexample"> +<pre class="smallexample">gcc -c -flto foo.c +g++ -c -flto bar.cc +gfortran -c -flto baz.f90 +g++ -o myprog -flto -O3 foo.o bar.o baz.o -lgfortran +</pre></div> + +<p>Notice that the final link is done with <code>g++</code> to get the C++ +runtime libraries and <samp>-lgfortran</samp> is added to get the Fortran +runtime libraries. In general, when mixing languages in LTO mode, you +should use the same link command options as when mixing languages in a +regular (non-LTO) compilation. +</p> +<p>If object files containing GIMPLE bytecode are stored in a library archive, say +<samp>libfoo.a</samp>, it is possible to extract and use them in an LTO link if you +are using a linker with plugin support. To create static libraries suitable +for LTO, use <code>gcc-ar</code> and <code>gcc-ranlib</code> instead of <code>ar</code> +and <code>ranlib</code>; +to show the symbols of object files with GIMPLE bytecode, use +<code>gcc-nm</code>. Those commands require that <code>ar</code>, <code>ranlib</code> +and <code>nm</code> have been compiled with plugin support. At link time, use the +flag <samp>-fuse-linker-plugin</samp> to ensure that the library participates in +the LTO optimization process: +</p> +<div class="smallexample"> +<pre class="smallexample">gcc -o myprog -O2 -flto -fuse-linker-plugin a.o b.o -lfoo +</pre></div> + +<p>With the linker plugin enabled, the linker extracts the needed +GIMPLE files from <samp>libfoo.a</samp> and passes them on to the running GCC +to make them part of the aggregated GIMPLE image to be optimized. +</p> +<p>If you are not using a linker with plugin support and/or do not +enable the linker plugin, then the objects inside <samp>libfoo.a</samp> +are extracted and linked as usual, but they do not participate +in the LTO optimization process. In order to make a static library suitable +for both LTO optimization and usual linkage, compile its object files with +<samp>-flto</samp> <samp>-ffat-lto-objects</samp>. +</p> +<p>Link-time optimizations do not require the presence of the whole program to +operate. If the program does not require any symbols to be exported, it is +possible to combine <samp>-flto</samp> and <samp>-fwhole-program</samp> to allow +the interprocedural optimizers to use more aggressive assumptions which may +lead to improved optimization opportunities. +Use of <samp>-fwhole-program</samp> is not needed when linker plugin is +active (see <samp>-fuse-linker-plugin</samp>). +</p> +<p>The current implementation of LTO makes no +attempt to generate bytecode that is portable between different +types of hosts. The bytecode files are versioned and there is a +strict version check, so bytecode files generated in one version of +GCC do not work with an older or newer version of GCC. +</p> +<p>Link-time optimization does not work well with generation of debugging +information on systems other than those using a combination of ELF and +DWARF. +</p> +<p>If you specify the optional <var>n</var>, the optimization and code +generation done at link time is executed in parallel using <var>n</var> +parallel jobs by utilizing an installed <code>make</code> program. The +environment variable <code>MAKE</code> may be used to override the program +used. +</p> +<p>You can also specify <samp>-flto=jobserver</samp> to use GNU make’s +job server mode to determine the number of parallel jobs. This +is useful when the Makefile calling GCC is already executing in parallel. +You must prepend a ‘<samp>+</samp>’ to the command recipe in the parent Makefile +for this to work. This option likely only works if <code>MAKE</code> is +GNU make. Even without the option value, GCC tries to automatically +detect a running GNU make’s job server. +</p> +<p>Use <samp>-flto=auto</samp> to use GNU make’s job server, if available, +or otherwise fall back to autodetection of the number of CPU threads +present in your system. +</p> +<a name="index-flto_002dpartition"></a> +</dd> +<dt><code>-flto-partition=<var>alg</var></code></dt> +<dd><p>Specify the partitioning algorithm used by the link-time optimizer. +The value is either ‘<samp>1to1</samp>’ to specify a partitioning mirroring +the original source files or ‘<samp>balanced</samp>’ to specify partitioning +into equally sized chunks (whenever possible) or ‘<samp>max</samp>’ to create +new partition for every symbol where possible. Specifying ‘<samp>none</samp>’ +as an algorithm disables partitioning and streaming completely. +The default value is ‘<samp>balanced</samp>’. While ‘<samp>1to1</samp>’ can be used +as an workaround for various code ordering issues, the ‘<samp>max</samp>’ +partitioning is intended for internal testing only. +The value ‘<samp>one</samp>’ specifies that exactly one partition should be +used while the value ‘<samp>none</samp>’ bypasses partitioning and executes +the link-time optimization step directly from the WPA phase. +</p> +<a name="index-flto_002dcompression_002dlevel"></a> +</dd> +<dt><code>-flto-compression-level=<var>n</var></code></dt> +<dd><p>This option specifies the level of compression used for intermediate +language written to LTO object files, and is only meaningful in +conjunction with LTO mode (<samp>-flto</samp>). GCC currently supports two +LTO compression algorithms. For zstd, valid values are 0 (no compression) +to 19 (maximum compression), while zlib supports values from 0 to 9. +Values outside this range are clamped to either minimum or maximum +of the supported values. If the option is not given, +a default balanced compression setting is used. +</p> +<a name="index-fuse_002dlinker_002dplugin"></a> +</dd> +<dt><code>-fuse-linker-plugin</code></dt> +<dd><p>Enables the use of a linker plugin during link-time optimization. This +option relies on plugin support in the linker, which is available in gold +or in GNU ld 2.21 or newer. +</p> +<p>This option enables the extraction of object files with GIMPLE bytecode out +of library archives. This improves the quality of optimization by exposing +more code to the link-time optimizer. This information specifies what +symbols can be accessed externally (by non-LTO object or during dynamic +linking). Resulting code quality improvements on binaries (and shared +libraries that use hidden visibility) are similar to <samp>-fwhole-program</samp>. +See <samp>-flto</samp> for a description of the effect of this flag and how to +use it. +</p> +<p>This option is enabled by default when LTO support in GCC is enabled +and GCC was configured for use with +a linker supporting plugins (GNU ld 2.21 or newer or gold). +</p> +<a name="index-ffat_002dlto_002dobjects"></a> +</dd> +<dt><code>-ffat-lto-objects</code></dt> +<dd><p>Fat LTO objects are object files that contain both the intermediate language +and the object code. This makes them usable for both LTO linking and normal +linking. This option is effective only when compiling with <samp>-flto</samp> +and is ignored at link time. +</p> +<p><samp>-fno-fat-lto-objects</samp> improves compilation time over plain LTO, but +requires the complete toolchain to be aware of LTO. It requires a linker with +linker plugin support for basic functionality. Additionally, +<code>nm</code>, <code>ar</code> and <code>ranlib</code> +need to support linker plugins to allow a full-featured build environment +(capable of building static libraries etc). GCC provides the <code>gcc-ar</code>, +<code>gcc-nm</code>, <code>gcc-ranlib</code> wrappers to pass the right options +to these tools. With non fat LTO makefiles need to be modified to use them. +</p> +<p>Note that modern binutils provide plugin auto-load mechanism. +Installing the linker plugin into <samp>$libdir/bfd-plugins</samp> has the same +effect as usage of the command wrappers (<code>gcc-ar</code>, <code>gcc-nm</code> and +<code>gcc-ranlib</code>). +</p> +<p>The default is <samp>-fno-fat-lto-objects</samp> on targets with linker plugin +support. +</p> +<a name="index-fcompare_002delim"></a> +</dd> +<dt><code>-fcompare-elim</code></dt> +<dd><p>After register allocation and post-register allocation instruction splitting, +identify arithmetic instructions that compute processor flags similar to a +comparison operation based on that arithmetic. If possible, eliminate the +explicit comparison operation. +</p> +<p>This pass only applies to certain targets that cannot explicitly represent +the comparison operation before register allocation is complete. +</p> +<p>Enabled at levels <samp>-O1</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-fcprop_002dregisters"></a> +</dd> +<dt><code>-fcprop-registers</code></dt> +<dd><p>After register allocation and post-register allocation instruction splitting, +perform a copy-propagation pass to try to reduce scheduling dependencies +and occasionally eliminate the copy. +</p> +<p>Enabled at levels <samp>-O1</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-fprofile_002dcorrection"></a> +</dd> +<dt><code>-fprofile-correction</code></dt> +<dd><p>Profiles collected using an instrumented binary for multi-threaded programs may +be inconsistent due to missed counter updates. When this option is specified, +GCC uses heuristics to correct or smooth out such inconsistencies. By +default, GCC emits an error message when an inconsistent profile is detected. +</p> +<p>This option is enabled by <samp>-fauto-profile</samp>. +</p> +<a name="index-fprofile_002dpartial_002dtraining"></a> +</dd> +<dt><code>-fprofile-partial-training</code></dt> +<dd><p>With <code>-fprofile-use</code> all portions of programs not executed during train +run are optimized agressively for size rather than speed. In some cases it is +not practical to train all possible hot paths in the program. (For +example, program may contain functions specific for a given hardware and +trianing may not cover all hardware configurations program is run on.) With +<code>-fprofile-partial-training</code> profile feedback will be ignored for all +functions not executed during the train run leading them to be optimized as if +they were compiled without profile feedback. This leads to better performance +when train run is not representative but also leads to significantly bigger +code. +</p> +<a name="index-fprofile_002duse"></a> +</dd> +<dt><code>-fprofile-use</code></dt> +<dt><code>-fprofile-use=<var>path</var></code></dt> +<dd><p>Enable profile feedback-directed optimizations, +and the following optimizations, many of which +are generally profitable only with profile feedback available: +</p> +<div class="smallexample"> +<pre class="smallexample">-fbranch-probabilities -fprofile-values +-funroll-loops -fpeel-loops -ftracer -fvpt +-finline-functions -fipa-cp -fipa-cp-clone -fipa-bit-cp +-fpredictive-commoning -fsplit-loops -funswitch-loops +-fgcse-after-reload -ftree-loop-vectorize -ftree-slp-vectorize +-fvect-cost-model=dynamic -ftree-loop-distribute-patterns +-fprofile-reorder-functions +</pre></div> + +<p>Before you can use this option, you must first generate profiling information. +See <a href="Instrumentation-Options.html#Instrumentation-Options">Instrumentation Options</a>, for information about the +<samp>-fprofile-generate</samp> option. +</p> +<p>By default, GCC emits an error message if the feedback profiles do not +match the source code. This error can be turned into a warning by using +<samp>-Wno-error=coverage-mismatch</samp>. Note this may result in poorly +optimized code. Additionally, by default, GCC also emits a warning message if +the feedback profiles do not exist (see <samp>-Wmissing-profile</samp>). +</p> +<p>If <var>path</var> is specified, GCC looks at the <var>path</var> to find +the profile feedback data files. See <samp>-fprofile-dir</samp>. +</p> +<a name="index-fauto_002dprofile"></a> +</dd> +<dt><code>-fauto-profile</code></dt> +<dt><code>-fauto-profile=<var>path</var></code></dt> +<dd><p>Enable sampling-based feedback-directed optimizations, +and the following optimizations, +many of which are generally profitable only with profile feedback available: +</p> +<div class="smallexample"> +<pre class="smallexample">-fbranch-probabilities -fprofile-values +-funroll-loops -fpeel-loops -ftracer -fvpt +-finline-functions -fipa-cp -fipa-cp-clone -fipa-bit-cp +-fpredictive-commoning -fsplit-loops -funswitch-loops +-fgcse-after-reload -ftree-loop-vectorize -ftree-slp-vectorize +-fvect-cost-model=dynamic -ftree-loop-distribute-patterns +-fprofile-correction +</pre></div> + +<p><var>path</var> is the name of a file containing AutoFDO profile information. +If omitted, it defaults to <samp>fbdata.afdo</samp> in the current directory. +</p> +<p>Producing an AutoFDO profile data file requires running your program +with the <code>perf</code> utility on a supported GNU/Linux target system. +For more information, see <a href="https://perf.wiki.kernel.org/">https://perf.wiki.kernel.org/</a>. +</p> +<p>E.g. +</p><div class="smallexample"> +<pre class="smallexample">perf record -e br_inst_retired:near_taken -b -o perf.data \ + -- your_program +</pre></div> + +<p>Then use the <code>create_gcov</code> tool to convert the raw profile data +to a format that can be used by GCC. You must also supply the +unstripped binary for your program to this tool. +See <a href="https://github.com/google/autofdo">https://github.com/google/autofdo</a>. +</p> +<p>E.g. +</p><div class="smallexample"> +<pre class="smallexample">create_gcov --binary=your_program.unstripped --profile=perf.data \ + --gcov=profile.afdo +</pre></div> +</dd> +</dl> + +<p>The following options control compiler behavior regarding floating-point +arithmetic. These options trade off between speed and +correctness. All must be specifically enabled. +</p> +<dl compact="compact"> +<dd><a name="index-ffloat_002dstore"></a> +</dd> +<dt><code>-ffloat-store</code></dt> +<dd><p>Do not store floating-point variables in registers, and inhibit other +options that might change whether a floating-point value is taken from a +register or memory. +</p> +<a name="index-floating_002dpoint-precision"></a> +<p>This option prevents undesirable excess precision on machines such as +the 68000 where the floating registers (of the 68881) keep more +precision than a <code>double</code> is supposed to have. Similarly for the +x86 architecture. For most programs, the excess precision does only +good, but a few programs rely on the precise definition of IEEE floating +point. Use <samp>-ffloat-store</samp> for such programs, after modifying +them to store all pertinent intermediate computations into variables. +</p> +<a name="index-fexcess_002dprecision"></a> +</dd> +<dt><code>-fexcess-precision=<var>style</var></code></dt> +<dd><p>This option allows further control over excess precision on machines +where floating-point operations occur in a format with more precision or +range than the IEEE standard and interchange floating-point types. By +default, <samp>-fexcess-precision=fast</samp> is in effect; this means that +operations may be carried out in a wider precision than the types specified +in the source if that would result in faster code, and it is unpredictable +when rounding to the types specified in the source code takes place. +When compiling C or C++, if <samp>-fexcess-precision=standard</samp> is specified +then excess precision follows the rules specified in ISO C99 or C++; in particular, +both casts and assignments cause values to be rounded to their +semantic types (whereas <samp>-ffloat-store</samp> only affects +assignments). This option is enabled by default for C or C++ if a strict +conformance option such as <samp>-std=c99</samp> or <samp>-std=c++17</samp> is used. +<samp>-ffast-math</samp> enables <samp>-fexcess-precision=fast</samp> by default +regardless of whether a strict conformance option is used. +</p> +<a name="index-mfpmath"></a> +<p><samp>-fexcess-precision=standard</samp> is not implemented for languages +other than C or C++. On the x86, it has no effect if <samp>-mfpmath=sse</samp> +or <samp>-mfpmath=sse+387</samp> is specified; in the former case, IEEE +semantics apply without excess precision, and in the latter, rounding +is unpredictable. +</p> +<a name="index-ffast_002dmath"></a> +</dd> +<dt><code>-ffast-math</code></dt> +<dd><p>Sets the options <samp>-fno-math-errno</samp>, <samp>-funsafe-math-optimizations</samp>, +<samp>-ffinite-math-only</samp>, <samp>-fno-rounding-math</samp>, +<samp>-fno-signaling-nans</samp>, <samp>-fcx-limited-range</samp> and +<samp>-fexcess-precision=fast</samp>. +</p> +<p>This option causes the preprocessor macro <code>__FAST_MATH__</code> to be defined. +</p> +<p>This option is not turned on by any <samp>-O</samp> option besides +<samp>-Ofast</samp> since it can result in incorrect output for programs +that depend on an exact implementation of IEEE or ISO rules/specifications +for math functions. It may, however, yield faster code for programs +that do not require the guarantees of these specifications. +</p> +<a name="index-fno_002dmath_002derrno"></a> +<a name="index-fmath_002derrno"></a> +</dd> +<dt><code>-fno-math-errno</code></dt> +<dd><p>Do not set <code>errno</code> after calling math functions that are executed +with a single instruction, e.g., <code>sqrt</code>. A program that relies on +IEEE exceptions for math error handling may want to use this flag +for speed while maintaining IEEE arithmetic compatibility. +</p> +<p>This option is not turned on by any <samp>-O</samp> option since +it can result in incorrect output for programs that depend on +an exact implementation of IEEE or ISO rules/specifications for +math functions. It may, however, yield faster code for programs +that do not require the guarantees of these specifications. +</p> +<p>The default is <samp>-fmath-errno</samp>. +</p> +<p>On Darwin systems, the math library never sets <code>errno</code>. There is +therefore no reason for the compiler to consider the possibility that +it might, and <samp>-fno-math-errno</samp> is the default. +</p> +<a name="index-funsafe_002dmath_002doptimizations"></a> +</dd> +<dt><code>-funsafe-math-optimizations</code></dt> +<dd> +<p>Allow optimizations for floating-point arithmetic that (a) assume +that arguments and results are valid and (b) may violate IEEE or +ANSI standards. When used at link time, it may include libraries +or startup files that change the default FPU control word or other +similar optimizations. +</p> +<p>This option is not turned on by any <samp>-O</samp> option since +it can result in incorrect output for programs that depend on +an exact implementation of IEEE or ISO rules/specifications for +math functions. It may, however, yield faster code for programs +that do not require the guarantees of these specifications. +Enables <samp>-fno-signed-zeros</samp>, <samp>-fno-trapping-math</samp>, +<samp>-fassociative-math</samp> and <samp>-freciprocal-math</samp>. +</p> +<p>The default is <samp>-fno-unsafe-math-optimizations</samp>. +</p> +<a name="index-fassociative_002dmath"></a> +</dd> +<dt><code>-fassociative-math</code></dt> +<dd> +<p>Allow re-association of operands in series of floating-point operations. +This violates the ISO C and C++ language standard by possibly changing +computation result. NOTE: re-ordering may change the sign of zero as +well as ignore NaNs and inhibit or create underflow or overflow (and +thus cannot be used on code that relies on rounding behavior like +<code>(x + 2**52) - 2**52</code>. May also reorder floating-point comparisons +and thus may not be used when ordered comparisons are required. +This option requires that both <samp>-fno-signed-zeros</samp> and +<samp>-fno-trapping-math</samp> be in effect. Moreover, it doesn’t make +much sense with <samp>-frounding-math</samp>. For Fortran the option +is automatically enabled when both <samp>-fno-signed-zeros</samp> and +<samp>-fno-trapping-math</samp> are in effect. +</p> +<p>The default is <samp>-fno-associative-math</samp>. +</p> +<a name="index-freciprocal_002dmath"></a> +</dd> +<dt><code>-freciprocal-math</code></dt> +<dd> +<p>Allow the reciprocal of a value to be used instead of dividing by +the value if this enables optimizations. For example <code>x / y</code> +can be replaced with <code>x * (1/y)</code>, which is useful if <code>(1/y)</code> +is subject to common subexpression elimination. Note that this loses +precision and increases the number of flops operating on the value. +</p> +<p>The default is <samp>-fno-reciprocal-math</samp>. +</p> +<a name="index-ffinite_002dmath_002donly"></a> +</dd> +<dt><code>-ffinite-math-only</code></dt> +<dd><p>Allow optimizations for floating-point arithmetic that assume +that arguments and results are not NaNs or +-Infs. +</p> +<p>This option is not turned on by any <samp>-O</samp> option since +it can result in incorrect output for programs that depend on +an exact implementation of IEEE or ISO rules/specifications for +math functions. It may, however, yield faster code for programs +that do not require the guarantees of these specifications. +</p> +<p>The default is <samp>-fno-finite-math-only</samp>. +</p> +<a name="index-fno_002dsigned_002dzeros"></a> +<a name="index-fsigned_002dzeros"></a> +</dd> +<dt><code>-fno-signed-zeros</code></dt> +<dd><p>Allow optimizations for floating-point arithmetic that ignore the +signedness of zero. IEEE arithmetic specifies the behavior of +distinct +0.0 and -0.0 values, which then prohibits simplification +of expressions such as x+0.0 or 0.0*x (even with <samp>-ffinite-math-only</samp>). +This option implies that the sign of a zero result isn’t significant. +</p> +<p>The default is <samp>-fsigned-zeros</samp>. +</p> +<a name="index-fno_002dtrapping_002dmath"></a> +<a name="index-ftrapping_002dmath"></a> +</dd> +<dt><code>-fno-trapping-math</code></dt> +<dd><p>Compile code assuming that floating-point operations cannot generate +user-visible traps. These traps include division by zero, overflow, +underflow, inexact result and invalid operation. This option requires +that <samp>-fno-signaling-nans</samp> be in effect. Setting this option may +allow faster code if one relies on “non-stop” IEEE arithmetic, for example. +</p> +<p>This option should never be turned on by any <samp>-O</samp> option since +it can result in incorrect output for programs that depend on +an exact implementation of IEEE or ISO rules/specifications for +math functions. +</p> +<p>The default is <samp>-ftrapping-math</samp>. +</p> +<p>Future versions of GCC may provide finer control of this setting +using C99’s <code>FENV_ACCESS</code> pragma. This command-line option +will be used along with <samp>-frounding-math</samp> to specify the +default state for <code>FENV_ACCESS</code>. +</p> +<a name="index-frounding_002dmath"></a> +</dd> +<dt><code>-frounding-math</code></dt> +<dd><p>Disable transformations and optimizations that assume default floating-point +rounding behavior. This is round-to-zero for all floating point +to integer conversions, and round-to-nearest for all other arithmetic +truncations. This option should be specified for programs that change +the FP rounding mode dynamically, or that may be executed with a +non-default rounding mode. This option disables constant folding of +floating-point expressions at compile time (which may be affected by +rounding mode) and arithmetic transformations that are unsafe in the +presence of sign-dependent rounding modes. +</p> +<p>The default is <samp>-fno-rounding-math</samp>. +</p> +<p>This option is experimental and does not currently guarantee to +disable all GCC optimizations that are affected by rounding mode. +Future versions of GCC may provide finer control of this setting +using C99’s <code>FENV_ACCESS</code> pragma. This command-line option +will be used along with <samp>-ftrapping-math</samp> to specify the +default state for <code>FENV_ACCESS</code>. +</p> +<a name="index-fsignaling_002dnans"></a> +</dd> +<dt><code>-fsignaling-nans</code></dt> +<dd><p>Compile code assuming that IEEE signaling NaNs may generate user-visible +traps during floating-point operations. Setting this option disables +optimizations that may change the number of exceptions visible with +signaling NaNs. This option implies <samp>-ftrapping-math</samp>. +</p> +<p>This option causes the preprocessor macro <code>__SUPPORT_SNAN__</code> to +be defined. +</p> +<p>The default is <samp>-fno-signaling-nans</samp>. +</p> +<p>This option is experimental and does not currently guarantee to +disable all GCC optimizations that affect signaling NaN behavior. +</p> +<a name="index-fno_002dfp_002dint_002dbuiltin_002dinexact"></a> +<a name="index-ffp_002dint_002dbuiltin_002dinexact"></a> +</dd> +<dt><code>-fno-fp-int-builtin-inexact</code></dt> +<dd><p>Do not allow the built-in functions <code>ceil</code>, <code>floor</code>, +<code>round</code> and <code>trunc</code>, and their <code>float</code> and <code>long +double</code> variants, to generate code that raises the “inexact” +floating-point exception for noninteger arguments. ISO C99 and C11 +allow these functions to raise the “inexact” exception, but ISO/IEC +TS 18661-1:2014, the C bindings to IEEE 754-2008, as integrated into +ISO C2X, does not allow these functions to do so. +</p> +<p>The default is <samp>-ffp-int-builtin-inexact</samp>, allowing the +exception to be raised, unless C2X or a later C standard is selected. +This option does nothing unless <samp>-ftrapping-math</samp> is in effect. +</p> +<p>Even if <samp>-fno-fp-int-builtin-inexact</samp> is used, if the functions +generate a call to a library function then the “inexact” exception +may be raised if the library implementation does not follow TS 18661. +</p> +<a name="index-fsingle_002dprecision_002dconstant"></a> +</dd> +<dt><code>-fsingle-precision-constant</code></dt> +<dd><p>Treat floating-point constants as single precision instead of +implicitly converting them to double-precision constants. +</p> +<a name="index-fcx_002dlimited_002drange"></a> +</dd> +<dt><code>-fcx-limited-range</code></dt> +<dd><p>When enabled, this option states that a range reduction step is not +needed when performing complex division. Also, there is no checking +whether the result of a complex multiplication or division is <code>NaN ++ I*NaN</code>, with an attempt to rescue the situation in that case. The +default is <samp>-fno-cx-limited-range</samp>, but is enabled by +<samp>-ffast-math</samp>. +</p> +<p>This option controls the default setting of the ISO C99 +<code>CX_LIMITED_RANGE</code> pragma. Nevertheless, the option applies to +all languages. +</p> +<a name="index-fcx_002dfortran_002drules"></a> +</dd> +<dt><code>-fcx-fortran-rules</code></dt> +<dd><p>Complex multiplication and division follow Fortran rules. Range +reduction is done as part of complex division, but there is no checking +whether the result of a complex multiplication or division is <code>NaN ++ I*NaN</code>, with an attempt to rescue the situation in that case. +</p> +<p>The default is <samp>-fno-cx-fortran-rules</samp>. +</p> +</dd> +</dl> + +<p>The following options control optimizations that may improve +performance, but are not enabled by any <samp>-O</samp> options. This +section includes experimental options that may produce broken code. +</p> +<dl compact="compact"> +<dd><a name="index-fbranch_002dprobabilities"></a> +</dd> +<dt><code>-fbranch-probabilities</code></dt> +<dd><p>After running a program compiled with <samp>-fprofile-arcs</samp> +(see <a href="Instrumentation-Options.html#Instrumentation-Options">Instrumentation Options</a>), +you can compile it a second time using +<samp>-fbranch-probabilities</samp>, to improve optimizations based on +the number of times each branch was taken. When a program +compiled with <samp>-fprofile-arcs</samp> exits, it saves arc execution +counts to a file called <samp><var>sourcename</var>.gcda</samp> for each source +file. The information in this data file is very dependent on the +structure of the generated code, so you must use the same source code +and the same optimization options for both compilations. +See details about the file naming in <samp>-fprofile-arcs</samp>. +</p> +<p>With <samp>-fbranch-probabilities</samp>, GCC puts a +‘<samp>REG_BR_PROB</samp>’ note on each ‘<samp>JUMP_INSN</samp>’ and ‘<samp>CALL_INSN</samp>’. +These can be used to improve optimization. Currently, they are only +used in one place: in <samp>reorg.cc</samp>, instead of guessing which path a +branch is most likely to take, the ‘<samp>REG_BR_PROB</samp>’ values are used to +exactly determine which path is taken more often. +</p> +<p>Enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>. +</p> +<a name="index-fprofile_002dvalues"></a> +</dd> +<dt><code>-fprofile-values</code></dt> +<dd><p>If combined with <samp>-fprofile-arcs</samp>, it adds code so that some +data about values of expressions in the program is gathered. +</p> +<p>With <samp>-fbranch-probabilities</samp>, it reads back the data gathered +from profiling values of expressions for usage in optimizations. +</p> +<p>Enabled by <samp>-fprofile-generate</samp>, <samp>-fprofile-use</samp>, and +<samp>-fauto-profile</samp>. +</p> +<a name="index-fprofile_002dreorder_002dfunctions"></a> +</dd> +<dt><code>-fprofile-reorder-functions</code></dt> +<dd><p>Function reordering based on profile instrumentation collects +first time of execution of a function and orders these functions +in ascending order. +</p> +<p>Enabled with <samp>-fprofile-use</samp>. +</p> +<a name="index-fvpt"></a> +</dd> +<dt><code>-fvpt</code></dt> +<dd><p>If combined with <samp>-fprofile-arcs</samp>, this option instructs the compiler +to add code to gather information about values of expressions. +</p> +<p>With <samp>-fbranch-probabilities</samp>, it reads back the data gathered +and actually performs the optimizations based on them. +Currently the optimizations include specialization of division operations +using the knowledge about the value of the denominator. +</p> +<p>Enabled with <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>. +</p> +<a name="index-frename_002dregisters"></a> +</dd> +<dt><code>-frename-registers</code></dt> +<dd><p>Attempt to avoid false dependencies in scheduled code by making use +of registers left over after register allocation. This optimization +most benefits processors with lots of registers. Depending on the +debug information format adopted by the target, however, it can +make debugging impossible, since variables no longer stay in +a “home register”. +</p> +<p>Enabled by default with <samp>-funroll-loops</samp>. +</p> +<a name="index-fschedule_002dfusion"></a> +</dd> +<dt><code>-fschedule-fusion</code></dt> +<dd><p>Performs a target dependent pass over the instruction stream to schedule +instructions of same type together because target machine can execute them +more efficiently if they are adjacent to each other in the instruction flow. +</p> +<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. +</p> +<a name="index-ftracer"></a> +</dd> +<dt><code>-ftracer</code></dt> +<dd><p>Perform tail duplication to enlarge superblock size. This transformation +simplifies the control flow of the function allowing other optimizations to do +a better job. +</p> +<p>Enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>. +</p> +<a name="index-funroll_002dloops"></a> +</dd> +<dt><code>-funroll-loops</code></dt> +<dd><p>Unroll loops whose number of iterations can be determined at compile time or +upon entry to the loop. <samp>-funroll-loops</samp> implies +<samp>-frerun-cse-after-loop</samp>, <samp>-fweb</samp> and <samp>-frename-registers</samp>. +It also turns on complete loop peeling (i.e. complete removal of loops with +a small constant number of iterations). This option makes code larger, and may +or may not make it run faster. +</p> +<p>Enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>. +</p> +<a name="index-funroll_002dall_002dloops"></a> +</dd> +<dt><code>-funroll-all-loops</code></dt> +<dd><p>Unroll all loops, even if their number of iterations is uncertain when +the loop is entered. This usually makes programs run more slowly. +<samp>-funroll-all-loops</samp> implies the same options as +<samp>-funroll-loops</samp>. +</p> +<a name="index-fpeel_002dloops"></a> +</dd> +<dt><code>-fpeel-loops</code></dt> +<dd><p>Peels loops for which there is enough information that they do not +roll much (from profile feedback or static analysis). It also turns on +complete loop peeling (i.e. complete removal of loops with small constant +number of iterations). +</p> +<p>Enabled by <samp>-O3</samp>, <samp>-fprofile-use</samp>, and <samp>-fauto-profile</samp>. +</p> +<a name="index-fmove_002dloop_002dinvariants"></a> +</dd> +<dt><code>-fmove-loop-invariants</code></dt> +<dd><p>Enables the loop invariant motion pass in the RTL loop optimizer. Enabled +at level <samp>-O1</samp> and higher, except for <samp>-Og</samp>. +</p> +<a name="index-fmove_002dloop_002dstores"></a> +</dd> +<dt><code>-fmove-loop-stores</code></dt> +<dd><p>Enables the loop store motion pass in the GIMPLE loop optimizer. This +moves invariant stores to after the end of the loop in exchange for +carrying the stored value in a register across the iteration. +Note for this option to have an effect <samp>-ftree-loop-im</samp> has to +be enabled as well. Enabled at level <samp>-O1</samp> and higher, except +for <samp>-Og</samp>. +</p> +<a name="index-fsplit_002dloops"></a> +</dd> +<dt><code>-fsplit-loops</code></dt> +<dd><p>Split a loop into two if it contains a condition that’s always true +for one side of the iteration space and false for the other. +</p> +<p>Enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>. +</p> +<a name="index-funswitch_002dloops"></a> +</dd> +<dt><code>-funswitch-loops</code></dt> +<dd><p>Move branches with loop invariant conditions out of the loop, with duplicates +of the loop on both branches (modified according to result of the condition). +</p> +<p>Enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>. +</p> +<a name="index-fversion_002dloops_002dfor_002dstrides"></a> +</dd> +<dt><code>-fversion-loops-for-strides</code></dt> +<dd><p>If a loop iterates over an array with a variable stride, create another +version of the loop that assumes the stride is always one. For example: +</p> +<div class="smallexample"> +<pre class="smallexample">for (int i = 0; i < n; ++i) + x[i * stride] = …; +</pre></div> + +<p>becomes: +</p> +<div class="smallexample"> +<pre class="smallexample">if (stride == 1) + for (int i = 0; i < n; ++i) + x[i] = …; +else + for (int i = 0; i < n; ++i) + x[i * stride] = …; +</pre></div> + +<p>This is particularly useful for assumed-shape arrays in Fortran where +(for example) it allows better vectorization assuming contiguous accesses. +This flag is enabled by default at <samp>-O3</samp>. +It is also enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>. +</p> +<a name="index-ffunction_002dsections"></a> +<a name="index-fdata_002dsections"></a> +</dd> +<dt><code>-ffunction-sections</code></dt> +<dt><code>-fdata-sections</code></dt> +<dd><p>Place each function or data item into its own section in the output +file if the target supports arbitrary sections. The name of the +function or the name of the data item determines the section’s name +in the output file. +</p> +<p>Use these options on systems where the linker can perform optimizations to +improve locality of reference in the instruction space. Most systems using the +ELF object format have linkers with such optimizations. On AIX, the linker +rearranges sections (CSECTs) based on the call graph. The performance impact +varies. +</p> +<p>Together with a linker garbage collection (linker <samp>--gc-sections</samp> +option) these options may lead to smaller statically-linked executables (after +stripping). +</p> +<p>On ELF/DWARF systems these options do not degenerate the quality of the debug +information. There could be issues with other object files/debug info formats. +</p> +<p>Only use these options when there are significant benefits from doing so. When +you specify these options, the assembler and linker create larger object and +executable files and are also slower. These options affect code generation. +They prevent optimizations by the compiler and assembler using relative +locations inside a translation unit since the locations are unknown until +link time. An example of such an optimization is relaxing calls to short call +instructions. +</p> +<a name="index-fstdarg_002dopt"></a> +</dd> +<dt><code>-fstdarg-opt</code></dt> +<dd><p>Optimize the prologue of variadic argument functions with respect to usage of +those arguments. +</p> +<a name="index-fsection_002danchors"></a> +</dd> +<dt><code>-fsection-anchors</code></dt> +<dd><p>Try to reduce the number of symbolic address calculations by using +shared “anchor” symbols to address nearby objects. This transformation +can help to reduce the number of GOT entries and GOT accesses on some +targets. +</p> +<p>For example, the implementation of the following function <code>foo</code>: +</p> +<div class="smallexample"> +<pre class="smallexample">static int a, b, c; +int foo (void) { return a + b + c; } +</pre></div> + +<p>usually calculates the addresses of all three variables, but if you +compile it with <samp>-fsection-anchors</samp>, it accesses the variables +from a common anchor point instead. The effect is similar to the +following pseudocode (which isn’t valid C): +</p> +<div class="smallexample"> +<pre class="smallexample">int foo (void) +{ + register int *xr = &x; + return xr[&a - &x] + xr[&b - &x] + xr[&c - &x]; +} +</pre></div> + +<p>Not all targets support this option. +</p> +<a name="index-fzero_002dcall_002dused_002dregs"></a> +</dd> +<dt><code>-fzero-call-used-regs=<var>choice</var></code></dt> +<dd><p>Zero call-used registers at function return to increase program +security by either mitigating Return-Oriented Programming (ROP) +attacks or preventing information leakage through registers. +</p> +<p>The possible values of <var>choice</var> are the same as for the +<code>zero_call_used_regs</code> attribute (see <a href="Function-Attributes.html#Function-Attributes">Function Attributes</a>). +The default is ‘<samp>skip</samp>’. +</p> +<p>You can control this behavior for a specific function by using the function +attribute <code>zero_call_used_regs</code> (see <a href="Function-Attributes.html#Function-Attributes">Function Attributes</a>). +</p> +<a name="index-param"></a> +</dd> +<dt><code>--param <var>name</var>=<var>value</var></code></dt> +<dd><p>In some places, GCC uses various constants to control the amount of +optimization that is done. For example, GCC does not inline functions +that contain more than a certain number of instructions. You can +control some of these constants on the command line using the +<samp>--param</samp> option. +</p> +<p>The names of specific parameters, and the meaning of the values, are +tied to the internals of the compiler, and are subject to change +without notice in future releases. +</p> +<p>In order to get the minimal, maximal and default values of a parameter, +use the <samp>--help=param -Q</samp> options. +</p> +<p>In each case, the <var>value</var> is an integer. The following choices +of <var>name</var> are recognized for all targets: +</p> +<dl compact="compact"> +<dt><code>predictable-branch-outcome</code></dt> +<dd><p>When branch is predicted to be taken with probability lower than this threshold +(in percent), then it is considered well predictable. +</p> +</dd> +<dt><code>max-rtl-if-conversion-insns</code></dt> +<dd><p>RTL if-conversion tries to remove conditional branches around a block and +replace them with conditionally executed instructions. This parameter +gives the maximum number of instructions in a block which should be +considered for if-conversion. The compiler will +also use other heuristics to decide whether if-conversion is likely to be +profitable. +</p> +</dd> +<dt><code>max-rtl-if-conversion-predictable-cost</code></dt> +<dd><p>RTL if-conversion will try to remove conditional branches around a block +and replace them with conditionally executed instructions. These parameters +give the maximum permissible cost for the sequence that would be generated +by if-conversion depending on whether the branch is statically determined +to be predictable or not. The units for this parameter are the same as +those for the GCC internal seq_cost metric. The compiler will try to +provide a reasonable default for this parameter using the BRANCH_COST +target macro. +</p> +</dd> +<dt><code>max-crossjump-edges</code></dt> +<dd><p>The maximum number of incoming edges to consider for cross-jumping. +The algorithm used by <samp>-fcrossjumping</samp> is <em>O(N^2)</em> in +the number of edges incoming to each block. Increasing values mean +more aggressive optimization, making the compilation time increase with +probably small improvement in executable size. +</p> +</dd> +<dt><code>min-crossjump-insns</code></dt> +<dd><p>The minimum number of instructions that must be matched at the end +of two blocks before cross-jumping is performed on them. This +value is ignored in the case where all instructions in the block being +cross-jumped from are matched. +</p> +</dd> +<dt><code>max-grow-copy-bb-insns</code></dt> +<dd><p>The maximum code size expansion factor when copying basic blocks +instead of jumping. The expansion is relative to a jump instruction. +</p> +</dd> +<dt><code>max-goto-duplication-insns</code></dt> +<dd><p>The maximum number of instructions to duplicate to a block that jumps +to a computed goto. To avoid <em>O(N^2)</em> behavior in a number of +passes, GCC factors computed gotos early in the compilation process, +and unfactors them as late as possible. Only computed jumps at the +end of a basic blocks with no more than max-goto-duplication-insns are +unfactored. +</p> +</dd> +<dt><code>max-delay-slot-insn-search</code></dt> +<dd><p>The maximum number of instructions to consider when looking for an +instruction to fill a delay slot. If more than this arbitrary number of +instructions are searched, the time savings from filling the delay slot +are minimal, so stop searching. Increasing values mean more +aggressive optimization, making the compilation time increase with probably +small improvement in execution time. +</p> +</dd> +<dt><code>max-delay-slot-live-search</code></dt> +<dd><p>When trying to fill delay slots, the maximum number of instructions to +consider when searching for a block with valid live register +information. Increasing this arbitrarily chosen value means more +aggressive optimization, increasing the compilation time. This parameter +should be removed when the delay slot code is rewritten to maintain the +control-flow graph. +</p> +</dd> +<dt><code>max-gcse-memory</code></dt> +<dd><p>The approximate maximum amount of memory in <code>kB</code> that can be allocated in +order to perform the global common subexpression elimination +optimization. If more memory than specified is required, the +optimization is not done. +</p> +</dd> +<dt><code>max-gcse-insertion-ratio</code></dt> +<dd><p>If the ratio of expression insertions to deletions is larger than this value +for any expression, then RTL PRE inserts or removes the expression and thus +leaves partially redundant computations in the instruction stream. +</p> +</dd> +<dt><code>max-pending-list-length</code></dt> +<dd><p>The maximum number of pending dependencies scheduling allows +before flushing the current state and starting over. Large functions +with few branches or calls can create excessively large lists which +needlessly consume memory and resources. +</p> +</dd> +<dt><code>max-modulo-backtrack-attempts</code></dt> +<dd><p>The maximum number of backtrack attempts the scheduler should make +when modulo scheduling a loop. Larger values can exponentially increase +compilation time. +</p> +</dd> +<dt><code>max-inline-functions-called-once-loop-depth</code></dt> +<dd><p>Maximal loop depth of a call considered by inline heuristics that tries to +inline all functions called once. +</p> +</dd> +<dt><code>max-inline-functions-called-once-insns</code></dt> +<dd><p>Maximal estimated size of functions produced while inlining functions called +once. +</p> +</dd> +<dt><code>max-inline-insns-single</code></dt> +<dd><p>Several parameters control the tree inliner used in GCC. This number sets the +maximum number of instructions (counted in GCC’s internal representation) in a +single function that the tree inliner considers for inlining. This only +affects functions declared inline and methods implemented in a class +declaration (C++). +</p> + +</dd> +<dt><code>max-inline-insns-auto</code></dt> +<dd><p>When you use <samp>-finline-functions</samp> (included in <samp>-O3</samp>), +a lot of functions that would otherwise not be considered for inlining +by the compiler are investigated. To those functions, a different +(more restrictive) limit compared to functions declared inline can +be applied (<samp>--param max-inline-insns-auto</samp>). +</p> +</dd> +<dt><code>max-inline-insns-small</code></dt> +<dd><p>This is bound applied to calls which are considered relevant with +<samp>-finline-small-functions</samp>. +</p> +</dd> +<dt><code>max-inline-insns-size</code></dt> +<dd><p>This is bound applied to calls which are optimized for size. Small growth +may be desirable to anticipate optimization oppurtunities exposed by inlining. +</p> +</dd> +<dt><code>uninlined-function-insns</code></dt> +<dd><p>Number of instructions accounted by inliner for function overhead such as +function prologue and epilogue. +</p> +</dd> +<dt><code>uninlined-function-time</code></dt> +<dd><p>Extra time accounted by inliner for function overhead such as time needed to +execute function prologue and epilogue. +</p> +</dd> +<dt><code>inline-heuristics-hint-percent</code></dt> +<dd><p>The scale (in percents) applied to <samp>inline-insns-single</samp>, +<samp>inline-insns-single-O2</samp>, <samp>inline-insns-auto</samp> +when inline heuristics hints that inlining is +very profitable (will enable later optimizations). +</p> +</dd> +<dt><code>uninlined-thunk-insns</code></dt> +<dt><code>uninlined-thunk-time</code></dt> +<dd><p>Same as <samp>--param uninlined-function-insns</samp> and +<samp>--param uninlined-function-time</samp> but applied to function thunks. +</p> +</dd> +<dt><code>inline-min-speedup</code></dt> +<dd><p>When estimated performance improvement of caller + callee runtime exceeds this +threshold (in percent), the function can be inlined regardless of the limit on +<samp>--param max-inline-insns-single</samp> and <samp>--param +max-inline-insns-auto</samp>. +</p> +</dd> +<dt><code>large-function-insns</code></dt> +<dd><p>The limit specifying really large functions. For functions larger than this +limit after inlining, inlining is constrained by +<samp>--param large-function-growth</samp>. This parameter is useful primarily +to avoid extreme compilation time caused by non-linear algorithms used by the +back end. +</p> +</dd> +<dt><code>large-function-growth</code></dt> +<dd><p>Specifies maximal growth of large function caused by inlining in percents. +For example, parameter value 100 limits large function growth to 2.0 times +the original size. +</p> +</dd> +<dt><code>large-unit-insns</code></dt> +<dd><p>The limit specifying large translation unit. Growth caused by inlining of +units larger than this limit is limited by <samp>--param inline-unit-growth</samp>. +For small units this might be too tight. +For example, consider a unit consisting of function A +that is inline and B that just calls A three times. If B is small relative to +A, the growth of unit is 300\% and yet such inlining is very sane. For very +large units consisting of small inlineable functions, however, the overall unit +growth limit is needed to avoid exponential explosion of code size. Thus for +smaller units, the size is increased to <samp>--param large-unit-insns</samp> +before applying <samp>--param inline-unit-growth</samp>. +</p> +</dd> +<dt><code>lazy-modules</code></dt> +<dd><p>Maximum number of concurrently open C++ module files when lazy loading. +</p> +</dd> +<dt><code>inline-unit-growth</code></dt> +<dd><p>Specifies maximal overall growth of the compilation unit caused by inlining. +For example, parameter value 20 limits unit growth to 1.2 times the original +size. Cold functions (either marked cold via an attribute or by profile +feedback) are not accounted into the unit size. +</p> +</dd> +<dt><code>ipa-cp-unit-growth</code></dt> +<dd><p>Specifies maximal overall growth of the compilation unit caused by +interprocedural constant propagation. For example, parameter value 10 limits +unit growth to 1.1 times the original size. +</p> +</dd> +<dt><code>ipa-cp-large-unit-insns</code></dt> +<dd><p>The size of translation unit that IPA-CP pass considers large. +</p> +</dd> +<dt><code>large-stack-frame</code></dt> +<dd><p>The limit specifying large stack frames. While inlining the algorithm is trying +to not grow past this limit too much. +</p> +</dd> +<dt><code>large-stack-frame-growth</code></dt> +<dd><p>Specifies maximal growth of large stack frames caused by inlining in percents. +For example, parameter value 1000 limits large stack frame growth to 11 times +the original size. +</p> +</dd> +<dt><code>max-inline-insns-recursive</code></dt> +<dt><code>max-inline-insns-recursive-auto</code></dt> +<dd><p>Specifies the maximum number of instructions an out-of-line copy of a +self-recursive inline +function can grow into by performing recursive inlining. +</p> +<p><samp>--param max-inline-insns-recursive</samp> applies to functions +declared inline. +For functions not declared inline, recursive inlining +happens only when <samp>-finline-functions</samp> (included in <samp>-O3</samp>) is +enabled; <samp>--param max-inline-insns-recursive-auto</samp> applies instead. +</p> +</dd> +<dt><code>max-inline-recursive-depth</code></dt> +<dt><code>max-inline-recursive-depth-auto</code></dt> +<dd><p>Specifies the maximum recursion depth used for recursive inlining. +</p> +<p><samp>--param max-inline-recursive-depth</samp> applies to functions +declared inline. For functions not declared inline, recursive inlining +happens only when <samp>-finline-functions</samp> (included in <samp>-O3</samp>) is +enabled; <samp>--param max-inline-recursive-depth-auto</samp> applies instead. +</p> +</dd> +<dt><code>min-inline-recursive-probability</code></dt> +<dd><p>Recursive inlining is profitable only for function having deep recursion +in average and can hurt for function having little recursion depth by +increasing the prologue size or complexity of function body to other +optimizers. +</p> +<p>When profile feedback is available (see <samp>-fprofile-generate</samp>) the actual +recursion depth can be guessed from the probability that function recurses +via a given call expression. This parameter limits inlining only to call +expressions whose probability exceeds the given threshold (in percents). +</p> +</dd> +<dt><code>early-inlining-insns</code></dt> +<dd><p>Specify growth that the early inliner can make. In effect it increases +the amount of inlining for code having a large abstraction penalty. +</p> +</dd> +<dt><code>max-early-inliner-iterations</code></dt> +<dd><p>Limit of iterations of the early inliner. This basically bounds +the number of nested indirect calls the early inliner can resolve. +Deeper chains are still handled by late inlining. +</p> +</dd> +<dt><code>comdat-sharing-probability</code></dt> +<dd><p>Probability (in percent) that C++ inline function with comdat visibility +are shared across multiple compilation units. +</p> +</dd> +<dt><code>modref-max-bases</code></dt> +<dt><code>modref-max-refs</code></dt> +<dt><code>modref-max-accesses</code></dt> +<dd><p>Specifies the maximal number of base pointers, references and accesses stored +for a single function by mod/ref analysis. +</p> +</dd> +<dt><code>modref-max-tests</code></dt> +<dd><p>Specifies the maxmal number of tests alias oracle can perform to disambiguate +memory locations using the mod/ref information. This parameter ought to be +bigger than <samp>--param modref-max-bases</samp> and <samp>--param +modref-max-refs</samp>. +</p> +</dd> +<dt><code>modref-max-depth</code></dt> +<dd><p>Specifies the maximum depth of DFS walk used by modref escape analysis. +Setting to 0 disables the analysis completely. +</p> +</dd> +<dt><code>modref-max-escape-points</code></dt> +<dd><p>Specifies the maximum number of escape points tracked by modref per SSA-name. +</p> +</dd> +<dt><code>modref-max-adjustments</code></dt> +<dd><p>Specifies the maximum number the access range is enlarged during modref dataflow +analysis. +</p> +</dd> +<dt><code>profile-func-internal-id</code></dt> +<dd><p>A parameter to control whether to use function internal id in profile +database lookup. If the value is 0, the compiler uses an id that +is based on function assembler name and filename, which makes old profile +data more tolerant to source changes such as function reordering etc. +</p> +</dd> +<dt><code>min-vect-loop-bound</code></dt> +<dd><p>The minimum number of iterations under which loops are not vectorized +when <samp>-ftree-vectorize</samp> is used. The number of iterations after +vectorization needs to be greater than the value specified by this option +to allow vectorization. +</p> +</dd> +<dt><code>gcse-cost-distance-ratio</code></dt> +<dd><p>Scaling factor in calculation of maximum distance an expression +can be moved by GCSE optimizations. This is currently supported only in the +code hoisting pass. The bigger the ratio, the more aggressive code hoisting +is with simple expressions, i.e., the expressions that have cost +less than <samp>gcse-unrestricted-cost</samp>. Specifying 0 disables +hoisting of simple expressions. +</p> +</dd> +<dt><code>gcse-unrestricted-cost</code></dt> +<dd><p>Cost, roughly measured as the cost of a single typical machine +instruction, at which GCSE optimizations do not constrain +the distance an expression can travel. This is currently +supported only in the code hoisting pass. The lesser the cost, +the more aggressive code hoisting is. Specifying 0 +allows all expressions to travel unrestricted distances. +</p> +</dd> +<dt><code>max-hoist-depth</code></dt> +<dd><p>The depth of search in the dominator tree for expressions to hoist. +This is used to avoid quadratic behavior in hoisting algorithm. +The value of 0 does not limit on the search, but may slow down compilation +of huge functions. +</p> +</dd> +<dt><code>max-tail-merge-comparisons</code></dt> +<dd><p>The maximum amount of similar bbs to compare a bb with. This is used to +avoid quadratic behavior in tree tail merging. +</p> +</dd> +<dt><code>max-tail-merge-iterations</code></dt> +<dd><p>The maximum amount of iterations of the pass over the function. This is used to +limit compilation time in tree tail merging. +</p> +</dd> +<dt><code>store-merging-allow-unaligned</code></dt> +<dd><p>Allow the store merging pass to introduce unaligned stores if it is legal to +do so. +</p> +</dd> +<dt><code>max-stores-to-merge</code></dt> +<dd><p>The maximum number of stores to attempt to merge into wider stores in the store +merging pass. +</p> +</dd> +<dt><code>max-store-chains-to-track</code></dt> +<dd><p>The maximum number of store chains to track at the same time in the attempt +to merge them into wider stores in the store merging pass. +</p> +</dd> +<dt><code>max-stores-to-track</code></dt> +<dd><p>The maximum number of stores to track at the same time in the attemt to +to merge them into wider stores in the store merging pass. +</p> +</dd> +<dt><code>max-unrolled-insns</code></dt> +<dd><p>The maximum number of instructions that a loop may have to be unrolled. +If a loop is unrolled, this parameter also determines how many times +the loop code is unrolled. +</p> +</dd> +<dt><code>max-average-unrolled-insns</code></dt> +<dd><p>The maximum number of instructions biased by probabilities of their execution +that a loop may have to be unrolled. If a loop is unrolled, +this parameter also determines how many times the loop code is unrolled. +</p> +</dd> +<dt><code>max-unroll-times</code></dt> +<dd><p>The maximum number of unrollings of a single loop. +</p> +</dd> +<dt><code>max-peeled-insns</code></dt> +<dd><p>The maximum number of instructions that a loop may have to be peeled. +If a loop is peeled, this parameter also determines how many times +the loop code is peeled. +</p> +</dd> +<dt><code>max-peel-times</code></dt> +<dd><p>The maximum number of peelings of a single loop. +</p> +</dd> +<dt><code>max-peel-branches</code></dt> +<dd><p>The maximum number of branches on the hot path through the peeled sequence. +</p> +</dd> +<dt><code>max-completely-peeled-insns</code></dt> +<dd><p>The maximum number of insns of a completely peeled loop. +</p> +</dd> +<dt><code>max-completely-peel-times</code></dt> +<dd><p>The maximum number of iterations of a loop to be suitable for complete peeling. +</p> +</dd> +<dt><code>max-completely-peel-loop-nest-depth</code></dt> +<dd><p>The maximum depth of a loop nest suitable for complete peeling. +</p> +</dd> +<dt><code>max-unswitch-insns</code></dt> +<dd><p>The maximum number of insns of an unswitched loop. +</p> +</dd> +<dt><code>max-unswitch-depth</code></dt> +<dd><p>The maximum depth of a loop nest to be unswitched. +</p> +</dd> +<dt><code>lim-expensive</code></dt> +<dd><p>The minimum cost of an expensive expression in the loop invariant motion. +</p> +</dd> +<dt><code>min-loop-cond-split-prob</code></dt> +<dd><p>When FDO profile information is available, <samp>min-loop-cond-split-prob</samp> +specifies minimum threshold for probability of semi-invariant condition +statement to trigger loop split. +</p> +</dd> +<dt><code>iv-consider-all-candidates-bound</code></dt> +<dd><p>Bound on number of candidates for induction variables, below which +all candidates are considered for each use in induction variable +optimizations. If there are more candidates than this, +only the most relevant ones are considered to avoid quadratic time complexity. +</p> +</dd> +<dt><code>iv-max-considered-uses</code></dt> +<dd><p>The induction variable optimizations give up on loops that contain more +induction variable uses. +</p> +</dd> +<dt><code>iv-always-prune-cand-set-bound</code></dt> +<dd><p>If the number of candidates in the set is smaller than this value, +always try to remove unnecessary ivs from the set +when adding a new one. +</p> +</dd> +<dt><code>avg-loop-niter</code></dt> +<dd><p>Average number of iterations of a loop. +</p> +</dd> +<dt><code>dse-max-object-size</code></dt> +<dd><p>Maximum size (in bytes) of objects tracked bytewise by dead store elimination. +Larger values may result in larger compilation times. +</p> +</dd> +<dt><code>dse-max-alias-queries-per-store</code></dt> +<dd><p>Maximum number of queries into the alias oracle per store. +Larger values result in larger compilation times and may result in more +removed dead stores. +</p> +</dd> +<dt><code>scev-max-expr-size</code></dt> +<dd><p>Bound on size of expressions used in the scalar evolutions analyzer. +Large expressions slow the analyzer. +</p> +</dd> +<dt><code>scev-max-expr-complexity</code></dt> +<dd><p>Bound on the complexity of the expressions in the scalar evolutions analyzer. +Complex expressions slow the analyzer. +</p> +</dd> +<dt><code>max-tree-if-conversion-phi-args</code></dt> +<dd><p>Maximum number of arguments in a PHI supported by TREE if conversion +unless the loop is marked with simd pragma. +</p> +</dd> +<dt><code>vect-max-layout-candidates</code></dt> +<dd><p>The maximum number of possible vector layouts (such as permutations) +to consider when optimizing to-be-vectorized code. +</p> +</dd> +<dt><code>vect-max-version-for-alignment-checks</code></dt> +<dd><p>The maximum number of run-time checks that can be performed when +doing loop versioning for alignment in the vectorizer. +</p> +</dd> +<dt><code>vect-max-version-for-alias-checks</code></dt> +<dd><p>The maximum number of run-time checks that can be performed when +doing loop versioning for alias in the vectorizer. +</p> +</dd> +<dt><code>vect-max-peeling-for-alignment</code></dt> +<dd><p>The maximum number of loop peels to enhance access alignment +for vectorizer. Value -1 means no limit. +</p> +</dd> +<dt><code>max-iterations-to-track</code></dt> +<dd><p>The maximum number of iterations of a loop the brute-force algorithm +for analysis of the number of iterations of the loop tries to evaluate. +</p> +</dd> +<dt><code>hot-bb-count-fraction</code></dt> +<dd><p>The denominator n of fraction 1/n of the maximal execution count of a +basic block in the entire program that a basic block needs to at least +have in order to be considered hot. The default is 10000, which means +that a basic block is considered hot if its execution count is greater +than 1/10000 of the maximal execution count. 0 means that it is never +considered hot. Used in non-LTO mode. +</p> +</dd> +<dt><code>hot-bb-count-ws-permille</code></dt> +<dd><p>The number of most executed permilles, ranging from 0 to 1000, of the +profiled execution of the entire program to which the execution count +of a basic block must be part of in order to be considered hot. The +default is 990, which means that a basic block is considered hot if +its execution count contributes to the upper 990 permilles, or 99.0%, +of the profiled execution of the entire program. 0 means that it is +never considered hot. Used in LTO mode. +</p> +</dd> +<dt><code>hot-bb-frequency-fraction</code></dt> +<dd><p>The denominator n of fraction 1/n of the execution frequency of the +entry block of a function that a basic block of this function needs +to at least have in order to be considered hot. The default is 1000, +which means that a basic block is considered hot in a function if it +is executed more frequently than 1/1000 of the frequency of the entry +block of the function. 0 means that it is never considered hot. +</p> +</dd> +<dt><code>unlikely-bb-count-fraction</code></dt> +<dd><p>The denominator n of fraction 1/n of the number of profiled runs of +the entire program below which the execution count of a basic block +must be in order for the basic block to be considered unlikely executed. +The default is 20, which means that a basic block is considered unlikely +executed if it is executed in fewer than 1/20, or 5%, of the runs of +the program. 0 means that it is always considered unlikely executed. +</p> +</dd> +<dt><code>max-predicted-iterations</code></dt> +<dd><p>The maximum number of loop iterations we predict statically. This is useful +in cases where a function contains a single loop with known bound and +another loop with unknown bound. +The known number of iterations is predicted correctly, while +the unknown number of iterations average to roughly 10. This means that the +loop without bounds appears artificially cold relative to the other one. +</p> +</dd> +<dt><code>builtin-expect-probability</code></dt> +<dd><p>Control the probability of the expression having the specified value. This +parameter takes a percentage (i.e. 0 ... 100) as input. +</p> +</dd> +<dt><code>builtin-string-cmp-inline-length</code></dt> +<dd><p>The maximum length of a constant string for a builtin string cmp call +eligible for inlining. +</p> +</dd> +<dt><code>align-threshold</code></dt> +<dd> +<p>Select fraction of the maximal frequency of executions of a basic block in +a function to align the basic block. +</p> +</dd> +<dt><code>align-loop-iterations</code></dt> +<dd> +<p>A loop expected to iterate at least the selected number of iterations is +aligned. +</p> +</dd> +<dt><code>tracer-dynamic-coverage</code></dt> +<dt><code>tracer-dynamic-coverage-feedback</code></dt> +<dd> +<p>This value is used to limit superblock formation once the given percentage of +executed instructions is covered. This limits unnecessary code size +expansion. +</p> +<p>The <samp>tracer-dynamic-coverage-feedback</samp> parameter +is used only when profile +feedback is available. The real profiles (as opposed to statically estimated +ones) are much less balanced allowing the threshold to be larger value. +</p> +</dd> +<dt><code>tracer-max-code-growth</code></dt> +<dd><p>Stop tail duplication once code growth has reached given percentage. This is +a rather artificial limit, as most of the duplicates are eliminated later in +cross jumping, so it may be set to much higher values than is the desired code +growth. +</p> +</dd> +<dt><code>tracer-min-branch-ratio</code></dt> +<dd> +<p>Stop reverse growth when the reverse probability of best edge is less than this +threshold (in percent). +</p> +</dd> +<dt><code>tracer-min-branch-probability</code></dt> +<dt><code>tracer-min-branch-probability-feedback</code></dt> +<dd> +<p>Stop forward growth if the best edge has probability lower than this +threshold. +</p> +<p>Similarly to <samp>tracer-dynamic-coverage</samp> two parameters are +provided. <samp>tracer-min-branch-probability-feedback</samp> is used for +compilation with profile feedback and <samp>tracer-min-branch-probability</samp> +compilation without. The value for compilation with profile feedback +needs to be more conservative (higher) in order to make tracer +effective. +</p> +</dd> +<dt><code>stack-clash-protection-guard-size</code></dt> +<dd><p>Specify the size of the operating system provided stack guard as +2 raised to <var>num</var> bytes. Higher values may reduce the +number of explicit probes, but a value larger than the operating system +provided guard will leave code vulnerable to stack clash style attacks. +</p> +</dd> +<dt><code>stack-clash-protection-probe-interval</code></dt> +<dd><p>Stack clash protection involves probing stack space as it is allocated. This +param controls the maximum distance between probes into the stack as 2 raised +to <var>num</var> bytes. Higher values may reduce the number of explicit probes, but a value +larger than the operating system provided guard will leave code vulnerable to +stack clash style attacks. +</p> +</dd> +<dt><code>max-cse-path-length</code></dt> +<dd> +<p>The maximum number of basic blocks on path that CSE considers. +</p> +</dd> +<dt><code>max-cse-insns</code></dt> +<dd><p>The maximum number of instructions CSE processes before flushing. +</p> +</dd> +<dt><code>ggc-min-expand</code></dt> +<dd> +<p>GCC uses a garbage collector to manage its own memory allocation. This +parameter specifies the minimum percentage by which the garbage +collector’s heap should be allowed to expand between collections. +Tuning this may improve compilation speed; it has no effect on code +generation. +</p> +<p>The default is 30% + 70% * (RAM/1GB) with an upper bound of 100% when +RAM >= 1GB. If <code>getrlimit</code> is available, the notion of “RAM” is +the smallest of actual RAM and <code>RLIMIT_DATA</code> or <code>RLIMIT_AS</code>. If +GCC is not able to calculate RAM on a particular platform, the lower +bound of 30% is used. Setting this parameter and +<samp>ggc-min-heapsize</samp> to zero causes a full collection to occur at +every opportunity. This is extremely slow, but can be useful for +debugging. +</p> +</dd> +<dt><code>ggc-min-heapsize</code></dt> +<dd> +<p>Minimum size of the garbage collector’s heap before it begins bothering +to collect garbage. The first collection occurs after the heap expands +by <samp>ggc-min-expand</samp>% beyond <samp>ggc-min-heapsize</samp>. Again, +tuning this may improve compilation speed, and has no effect on code +generation. +</p> +<p>The default is the smaller of RAM/8, RLIMIT_RSS, or a limit that +tries to ensure that RLIMIT_DATA or RLIMIT_AS are not exceeded, but +with a lower bound of 4096 (four megabytes) and an upper bound of +131072 (128 megabytes). If GCC is not able to calculate RAM on a +particular platform, the lower bound is used. Setting this parameter +very large effectively disables garbage collection. Setting this +parameter and <samp>ggc-min-expand</samp> to zero causes a full collection +to occur at every opportunity. +</p> +</dd> +<dt><code>max-reload-search-insns</code></dt> +<dd><p>The maximum number of instruction reload should look backward for equivalent +register. Increasing values mean more aggressive optimization, making the +compilation time increase with probably slightly better performance. +</p> +</dd> +<dt><code>max-cselib-memory-locations</code></dt> +<dd><p>The maximum number of memory locations cselib should take into account. +Increasing values mean more aggressive optimization, making the compilation time +increase with probably slightly better performance. +</p> +</dd> +<dt><code>max-sched-ready-insns</code></dt> +<dd><p>The maximum number of instructions ready to be issued the scheduler should +consider at any given time during the first scheduling pass. Increasing +values mean more thorough searches, making the compilation time increase +with probably little benefit. +</p> +</dd> +<dt><code>max-sched-region-blocks</code></dt> +<dd><p>The maximum number of blocks in a region to be considered for +interblock scheduling. +</p> +</dd> +<dt><code>max-pipeline-region-blocks</code></dt> +<dd><p>The maximum number of blocks in a region to be considered for +pipelining in the selective scheduler. +</p> +</dd> +<dt><code>max-sched-region-insns</code></dt> +<dd><p>The maximum number of insns in a region to be considered for +interblock scheduling. +</p> +</dd> +<dt><code>max-pipeline-region-insns</code></dt> +<dd><p>The maximum number of insns in a region to be considered for +pipelining in the selective scheduler. +</p> +</dd> +<dt><code>min-spec-prob</code></dt> +<dd><p>The minimum probability (in percents) of reaching a source block +for interblock speculative scheduling. +</p> +</dd> +<dt><code>max-sched-extend-regions-iters</code></dt> +<dd><p>The maximum number of iterations through CFG to extend regions. +A value of 0 disables region extensions. +</p> +</dd> +<dt><code>max-sched-insn-conflict-delay</code></dt> +<dd><p>The maximum conflict delay for an insn to be considered for speculative motion. +</p> +</dd> +<dt><code>sched-spec-prob-cutoff</code></dt> +<dd><p>The minimal probability of speculation success (in percents), so that +speculative insns are scheduled. +</p> +</dd> +<dt><code>sched-state-edge-prob-cutoff</code></dt> +<dd><p>The minimum probability an edge must have for the scheduler to save its +state across it. +</p> +</dd> +<dt><code>sched-mem-true-dep-cost</code></dt> +<dd><p>Minimal distance (in CPU cycles) between store and load targeting same +memory locations. +</p> +</dd> +<dt><code>selsched-max-lookahead</code></dt> +<dd><p>The maximum size of the lookahead window of selective scheduling. It is a +depth of search for available instructions. +</p> +</dd> +<dt><code>selsched-max-sched-times</code></dt> +<dd><p>The maximum number of times that an instruction is scheduled during +selective scheduling. This is the limit on the number of iterations +through which the instruction may be pipelined. +</p> +</dd> +<dt><code>selsched-insns-to-rename</code></dt> +<dd><p>The maximum number of best instructions in the ready list that are considered +for renaming in the selective scheduler. +</p> +</dd> +<dt><code>sms-min-sc</code></dt> +<dd><p>The minimum value of stage count that swing modulo scheduler +generates. +</p> +</dd> +<dt><code>max-last-value-rtl</code></dt> +<dd><p>The maximum size measured as number of RTLs that can be recorded in an expression +in combiner for a pseudo register as last known value of that register. +</p> +</dd> +<dt><code>max-combine-insns</code></dt> +<dd><p>The maximum number of instructions the RTL combiner tries to combine. +</p> +</dd> +<dt><code>integer-share-limit</code></dt> +<dd><p>Small integer constants can use a shared data structure, reducing the +compiler’s memory usage and increasing its speed. This sets the maximum +value of a shared integer constant. +</p> +</dd> +<dt><code>ssp-buffer-size</code></dt> +<dd><p>The minimum size of buffers (i.e. arrays) that receive stack smashing +protection when <samp>-fstack-protector</samp> is used. +</p> +</dd> +<dt><code>min-size-for-stack-sharing</code></dt> +<dd><p>The minimum size of variables taking part in stack slot sharing when not +optimizing. +</p> +</dd> +<dt><code>max-jump-thread-duplication-stmts</code></dt> +<dd><p>Maximum number of statements allowed in a block that needs to be +duplicated when threading jumps. +</p> +</dd> +<dt><code>max-jump-thread-paths</code></dt> +<dd><p>The maximum number of paths to consider when searching for jump threading +opportunities. When arriving at a block, incoming edges are only considered +if the number of paths to be searched so far multiplied by the number of +incoming edges does not exhaust the specified maximum number of paths to +consider. +</p> +</dd> +<dt><code>max-fields-for-field-sensitive</code></dt> +<dd><p>Maximum number of fields in a structure treated in +a field sensitive manner during pointer analysis. +</p> +</dd> +<dt><code>prefetch-latency</code></dt> +<dd><p>Estimate on average number of instructions that are executed before +prefetch finishes. The distance prefetched ahead is proportional +to this constant. Increasing this number may also lead to less +streams being prefetched (see <samp>simultaneous-prefetches</samp>). +</p> +</dd> +<dt><code>simultaneous-prefetches</code></dt> +<dd><p>Maximum number of prefetches that can run at the same time. +</p> +</dd> +<dt><code>l1-cache-line-size</code></dt> +<dd><p>The size of cache line in L1 data cache, in bytes. +</p> +</dd> +<dt><code>l1-cache-size</code></dt> +<dd><p>The size of L1 data cache, in kilobytes. +</p> +</dd> +<dt><code>l2-cache-size</code></dt> +<dd><p>The size of L2 data cache, in kilobytes. +</p> +</dd> +<dt><code>prefetch-dynamic-strides</code></dt> +<dd><p>Whether the loop array prefetch pass should issue software prefetch hints +for strides that are non-constant. In some cases this may be +beneficial, though the fact the stride is non-constant may make it +hard to predict when there is clear benefit to issuing these hints. +</p> +<p>Set to 1 if the prefetch hints should be issued for non-constant +strides. Set to 0 if prefetch hints should be issued only for strides that +are known to be constant and below <samp>prefetch-minimum-stride</samp>. +</p> +</dd> +<dt><code>prefetch-minimum-stride</code></dt> +<dd><p>Minimum constant stride, in bytes, to start using prefetch hints for. If +the stride is less than this threshold, prefetch hints will not be issued. +</p> +<p>This setting is useful for processors that have hardware prefetchers, in +which case there may be conflicts between the hardware prefetchers and +the software prefetchers. If the hardware prefetchers have a maximum +stride they can handle, it should be used here to improve the use of +software prefetchers. +</p> +<p>A value of -1 means we don’t have a threshold and therefore +prefetch hints can be issued for any constant stride. +</p> +<p>This setting is only useful for strides that are known and constant. +</p> +</dd> +<dt><code>destructive-interference-size</code></dt> +<dt><code>constructive-interference-size</code></dt> +<dd><p>The values for the C++17 variables +<code>std::hardware_destructive_interference_size</code> and +<code>std::hardware_constructive_interference_size</code>. The destructive +interference size is the minimum recommended offset between two +independent concurrently-accessed objects; the constructive +interference size is the maximum recommended size of contiguous memory +accessed together. Typically both will be the size of an L1 cache +line for the target, in bytes. For a generic target covering a range of L1 +cache line sizes, typically the constructive interference size will be +the small end of the range and the destructive size will be the large +end. +</p> +<p>The destructive interference size is intended to be used for layout, +and thus has ABI impact. The default value is not expected to be +stable, and on some targets varies with <samp>-mtune</samp>, so use of +this variable in a context where ABI stability is important, such as +the public interface of a library, is strongly discouraged; if it is +used in that context, users can stabilize the value using this +option. +</p> +<p>The constructive interference size is less sensitive, as it is +typically only used in a ‘<samp>static_assert</samp>’ to make sure that a type +fits within a cache line. +</p> +<p>See also <samp>-Winterference-size</samp>. +</p> +</dd> +<dt><code>loop-interchange-max-num-stmts</code></dt> +<dd><p>The maximum number of stmts in a loop to be interchanged. +</p> +</dd> +<dt><code>loop-interchange-stride-ratio</code></dt> +<dd><p>The minimum ratio between stride of two loops for interchange to be profitable. +</p> +</dd> +<dt><code>min-insn-to-prefetch-ratio</code></dt> +<dd><p>The minimum ratio between the number of instructions and the +number of prefetches to enable prefetching in a loop. +</p> +</dd> +<dt><code>prefetch-min-insn-to-mem-ratio</code></dt> +<dd><p>The minimum ratio between the number of instructions and the +number of memory references to enable prefetching in a loop. +</p> +</dd> +<dt><code>use-canonical-types</code></dt> +<dd><p>Whether the compiler should use the “canonical” type system. +Should always be 1, which uses a more efficient internal +mechanism for comparing types in C++ and Objective-C++. However, if +bugs in the canonical type system are causing compilation failures, +set this value to 0 to disable canonical types. +</p> +</dd> +<dt><code>switch-conversion-max-branch-ratio</code></dt> +<dd><p>Switch initialization conversion refuses to create arrays that are +bigger than <samp>switch-conversion-max-branch-ratio</samp> times the number of +branches in the switch. +</p> +</dd> +<dt><code>max-partial-antic-length</code></dt> +<dd><p>Maximum length of the partial antic set computed during the tree +partial redundancy elimination optimization (<samp>-ftree-pre</samp>) when +optimizing at <samp>-O3</samp> and above. For some sorts of source code +the enhanced partial redundancy elimination optimization can run away, +consuming all of the memory available on the host machine. This +parameter sets a limit on the length of the sets that are computed, +which prevents the runaway behavior. Setting a value of 0 for +this parameter allows an unlimited set length. +</p> +</dd> +<dt><code>rpo-vn-max-loop-depth</code></dt> +<dd><p>Maximum loop depth that is value-numbered optimistically. +When the limit hits the innermost +<var>rpo-vn-max-loop-depth</var> loops and the outermost loop in the +loop nest are value-numbered optimistically and the remaining ones not. +</p> +</dd> +<dt><code>sccvn-max-alias-queries-per-access</code></dt> +<dd><p>Maximum number of alias-oracle queries we perform when looking for +redundancies for loads and stores. If this limit is hit the search +is aborted and the load or store is not considered redundant. The +number of queries is algorithmically limited to the number of +stores on all paths from the load to the function entry. +</p> +</dd> +<dt><code>ira-max-loops-num</code></dt> +<dd><p>IRA uses regional register allocation by default. If a function +contains more loops than the number given by this parameter, only at most +the given number of the most frequently-executed loops form regions +for regional register allocation. +</p> +</dd> +<dt><code>ira-max-conflict-table-size</code></dt> +<dd><p>Although IRA uses a sophisticated algorithm to compress the conflict +table, the table can still require excessive amounts of memory for +huge functions. If the conflict table for a function could be more +than the size in MB given by this parameter, the register allocator +instead uses a faster, simpler, and lower-quality +algorithm that does not require building a pseudo-register conflict table. +</p> +</dd> +<dt><code>ira-loop-reserved-regs</code></dt> +<dd><p>IRA can be used to evaluate more accurate register pressure in loops +for decisions to move loop invariants (see <samp>-O3</samp>). The number +of available registers reserved for some other purposes is given +by this parameter. Default of the parameter +is the best found from numerous experiments. +</p> +</dd> +<dt><code>ira-consider-dup-in-all-alts</code></dt> +<dd><p>Make IRA to consider matching constraint (duplicated operand number) +heavily in all available alternatives for preferred register class. +If it is set as zero, it means IRA only respects the matching +constraint when it’s in the only available alternative with an +appropriate register class. Otherwise, it means IRA will check all +available alternatives for preferred register class even if it has +found some choice with an appropriate register class and respect the +found qualified matching constraint. +</p> +</dd> +<dt><code>ira-simple-lra-insn-threshold</code></dt> +<dd><p>Approximate function insn number in 1K units triggering simple local RA. +</p> +</dd> +<dt><code>lra-inheritance-ebb-probability-cutoff</code></dt> +<dd><p>LRA tries to reuse values reloaded in registers in subsequent insns. +This optimization is called inheritance. EBB is used as a region to +do this optimization. The parameter defines a minimal fall-through +edge probability in percentage used to add BB to inheritance EBB in +LRA. The default value was chosen +from numerous runs of SPEC2000 on x86-64. +</p> +</dd> +<dt><code>loop-invariant-max-bbs-in-loop</code></dt> +<dd><p>Loop invariant motion can be very expensive, both in compilation time and +in amount of needed compile-time memory, with very large loops. Loops +with more basic blocks than this parameter won’t have loop invariant +motion optimization performed on them. +</p> +</dd> +<dt><code>loop-max-datarefs-for-datadeps</code></dt> +<dd><p>Building data dependencies is expensive for very large loops. This +parameter limits the number of data references in loops that are +considered for data dependence analysis. These large loops are no +handled by the optimizations using loop data dependencies. +</p> +</dd> +<dt><code>max-vartrack-size</code></dt> +<dd><p>Sets a maximum number of hash table slots to use during variable +tracking dataflow analysis of any function. If this limit is exceeded +with variable tracking at assignments enabled, analysis for that +function is retried without it, after removing all debug insns from +the function. If the limit is exceeded even without debug insns, var +tracking analysis is completely disabled for the function. Setting +the parameter to zero makes it unlimited. +</p> +</dd> +<dt><code>max-vartrack-expr-depth</code></dt> +<dd><p>Sets a maximum number of recursion levels when attempting to map +variable names or debug temporaries to value expressions. This trades +compilation time for more complete debug information. If this is set too +low, value expressions that are available and could be represented in +debug information may end up not being used; setting this higher may +enable the compiler to find more complex debug expressions, but compile +time and memory use may grow. +</p> +</dd> +<dt><code>max-debug-marker-count</code></dt> +<dd><p>Sets a threshold on the number of debug markers (e.g. begin stmt +markers) to avoid complexity explosion at inlining or expanding to RTL. +If a function has more such gimple stmts than the set limit, such stmts +will be dropped from the inlined copy of a function, and from its RTL +expansion. +</p> +</dd> +<dt><code>min-nondebug-insn-uid</code></dt> +<dd><p>Use uids starting at this parameter for nondebug insns. The range below +the parameter is reserved exclusively for debug insns created by +<samp>-fvar-tracking-assignments</samp>, but debug insns may get +(non-overlapping) uids above it if the reserved range is exhausted. +</p> +</dd> +<dt><code>ipa-sra-deref-prob-threshold</code></dt> +<dd><p>IPA-SRA replaces a pointer which is known not be NULL with one or more +new parameters only when the probability (in percent, relative to +function entry) of it being dereferenced is higher than this parameter. +</p> +</dd> +<dt><code>ipa-sra-ptr-growth-factor</code></dt> +<dd><p>IPA-SRA replaces a pointer to an aggregate with one or more new +parameters only when their cumulative size is less or equal to +<samp>ipa-sra-ptr-growth-factor</samp> times the size of the original +pointer parameter. +</p> +</dd> +<dt><code>ipa-sra-ptrwrap-growth-factor</code></dt> +<dd><p>Additional maximum allowed growth of total size of new parameters +that ipa-sra replaces a pointer to an aggregate with, +if it points to a local variable that the caller only writes to and +passes it as an argument to other functions. +</p> +</dd> +<dt><code>ipa-sra-max-replacements</code></dt> +<dd><p>Maximum pieces of an aggregate that IPA-SRA tracks. As a +consequence, it is also the maximum number of replacements of a formal +parameter. +</p> +</dd> +<dt><code>sra-max-scalarization-size-Ospeed</code></dt> +<dt><code>sra-max-scalarization-size-Osize</code></dt> +<dd><p>The two Scalar Reduction of Aggregates passes (SRA and IPA-SRA) aim to +replace scalar parts of aggregates with uses of independent scalar +variables. These parameters control the maximum size, in storage units, +of aggregate which is considered for replacement when compiling for +speed +(<samp>sra-max-scalarization-size-Ospeed</samp>) or size +(<samp>sra-max-scalarization-size-Osize</samp>) respectively. +</p> +</dd> +<dt><code>sra-max-propagations</code></dt> +<dd><p>The maximum number of artificial accesses that Scalar Replacement of +Aggregates (SRA) will track, per one local variable, in order to +facilitate copy propagation. +</p> +</dd> +<dt><code>tm-max-aggregate-size</code></dt> +<dd><p>When making copies of thread-local variables in a transaction, this +parameter specifies the size in bytes after which variables are +saved with the logging functions as opposed to save/restore code +sequence pairs. This option only applies when using +<samp>-fgnu-tm</samp>. +</p> +</dd> +<dt><code>graphite-max-nb-scop-params</code></dt> +<dd><p>To avoid exponential effects in the Graphite loop transforms, the +number of parameters in a Static Control Part (SCoP) is bounded. +A value of zero can be used to lift +the bound. A variable whose value is unknown at compilation time and +defined outside a SCoP is a parameter of the SCoP. +</p> +</dd> +<dt><code>loop-block-tile-size</code></dt> +<dd><p>Loop blocking or strip mining transforms, enabled with +<samp>-floop-block</samp> or <samp>-floop-strip-mine</samp>, strip mine each +loop in the loop nest by a given number of iterations. The strip +length can be changed using the <samp>loop-block-tile-size</samp> +parameter. +</p> +</dd> +<dt><code>ipa-jump-function-lookups</code></dt> +<dd><p>Specifies number of statements visited during jump function offset discovery. +</p> +</dd> +<dt><code>ipa-cp-value-list-size</code></dt> +<dd><p>IPA-CP attempts to track all possible values and types passed to a function’s +parameter in order to propagate them and perform devirtualization. +<samp>ipa-cp-value-list-size</samp> is the maximum number of values and types it +stores per one formal parameter of a function. +</p> +</dd> +<dt><code>ipa-cp-eval-threshold</code></dt> +<dd><p>IPA-CP calculates its own score of cloning profitability heuristics +and performs those cloning opportunities with scores that exceed +<samp>ipa-cp-eval-threshold</samp>. +</p> +</dd> +<dt><code>ipa-cp-max-recursive-depth</code></dt> +<dd><p>Maximum depth of recursive cloning for self-recursive function. +</p> +</dd> +<dt><code>ipa-cp-min-recursive-probability</code></dt> +<dd><p>Recursive cloning only when the probability of call being executed exceeds +the parameter. +</p> +</dd> +<dt><code>ipa-cp-profile-count-base</code></dt> +<dd><p>When using <samp>-fprofile-use</samp> option, IPA-CP will consider the measured +execution count of a call graph edge at this percentage position in their +histogram as the basis for its heuristics calculation. +</p> +</dd> +<dt><code>ipa-cp-recursive-freq-factor</code></dt> +<dd><p>The number of times interprocedural copy propagation expects recursive +functions to call themselves. +</p> +</dd> +<dt><code>ipa-cp-recursion-penalty</code></dt> +<dd><p>Percentage penalty the recursive functions will receive when they +are evaluated for cloning. +</p> +</dd> +<dt><code>ipa-cp-single-call-penalty</code></dt> +<dd><p>Percentage penalty functions containing a single call to another +function will receive when they are evaluated for cloning. +</p> +</dd> +<dt><code>ipa-max-agg-items</code></dt> +<dd><p>IPA-CP is also capable to propagate a number of scalar values passed +in an aggregate. <samp>ipa-max-agg-items</samp> controls the maximum +number of such values per one parameter. +</p> +</dd> +<dt><code>ipa-cp-loop-hint-bonus</code></dt> +<dd><p>When IPA-CP determines that a cloning candidate would make the number +of iterations of a loop known, it adds a bonus of +<samp>ipa-cp-loop-hint-bonus</samp> to the profitability score of +the candidate. +</p> +</dd> +<dt><code>ipa-max-loop-predicates</code></dt> +<dd><p>The maximum number of different predicates IPA will use to describe when +loops in a function have known properties. +</p> +</dd> +<dt><code>ipa-max-aa-steps</code></dt> +<dd><p>During its analysis of function bodies, IPA-CP employs alias analysis +in order to track values pointed to by function parameters. In order +not spend too much time analyzing huge functions, it gives up and +consider all memory clobbered after examining +<samp>ipa-max-aa-steps</samp> statements modifying memory. +</p> +</dd> +<dt><code>ipa-max-switch-predicate-bounds</code></dt> +<dd><p>Maximal number of boundary endpoints of case ranges of switch statement. +For switch exceeding this limit, IPA-CP will not construct cloning cost +predicate, which is used to estimate cloning benefit, for default case +of the switch statement. +</p> +</dd> +<dt><code>ipa-max-param-expr-ops</code></dt> +<dd><p>IPA-CP will analyze conditional statement that references some function +parameter to estimate benefit for cloning upon certain constant value. +But if number of operations in a parameter expression exceeds +<samp>ipa-max-param-expr-ops</samp>, the expression is treated as complicated +one, and is not handled by IPA analysis. +</p> +</dd> +<dt><code>lto-partitions</code></dt> +<dd><p>Specify desired number of partitions produced during WHOPR compilation. +The number of partitions should exceed the number of CPUs used for compilation. +</p> +</dd> +<dt><code>lto-min-partition</code></dt> +<dd><p>Size of minimal partition for WHOPR (in estimated instructions). +This prevents expenses of splitting very small programs into too many +partitions. +</p> +</dd> +<dt><code>lto-max-partition</code></dt> +<dd><p>Size of max partition for WHOPR (in estimated instructions). +to provide an upper bound for individual size of partition. +Meant to be used only with balanced partitioning. +</p> +</dd> +<dt><code>lto-max-streaming-parallelism</code></dt> +<dd><p>Maximal number of parallel processes used for LTO streaming. +</p> +</dd> +<dt><code>cxx-max-namespaces-for-diagnostic-help</code></dt> +<dd><p>The maximum number of namespaces to consult for suggestions when C++ +name lookup fails for an identifier. +</p> +</dd> +<dt><code>sink-frequency-threshold</code></dt> +<dd><p>The maximum relative execution frequency (in percents) of the target block +relative to a statement’s original block to allow statement sinking of a +statement. Larger numbers result in more aggressive statement sinking. +A small positive adjustment is applied for +statements with memory operands as those are even more profitable so sink. +</p> +</dd> +<dt><code>max-stores-to-sink</code></dt> +<dd><p>The maximum number of conditional store pairs that can be sunk. Set to 0 +if either vectorization (<samp>-ftree-vectorize</samp>) or if-conversion +(<samp>-ftree-loop-if-convert</samp>) is disabled. +</p> +</dd> +<dt><code>case-values-threshold</code></dt> +<dd><p>The smallest number of different values for which it is best to use a +jump-table instead of a tree of conditional branches. If the value is +0, use the default for the machine. +</p> +</dd> +<dt><code>jump-table-max-growth-ratio-for-size</code></dt> +<dd><p>The maximum code size growth ratio when expanding +into a jump table (in percent). The parameter is used when +optimizing for size. +</p> +</dd> +<dt><code>jump-table-max-growth-ratio-for-speed</code></dt> +<dd><p>The maximum code size growth ratio when expanding +into a jump table (in percent). The parameter is used when +optimizing for speed. +</p> +</dd> +<dt><code>tree-reassoc-width</code></dt> +<dd><p>Set the maximum number of instructions executed in parallel in +reassociated tree. This parameter overrides target dependent +heuristics used by default if has non zero value. +</p> +</dd> +<dt><code>sched-pressure-algorithm</code></dt> +<dd><p>Choose between the two available implementations of +<samp>-fsched-pressure</samp>. Algorithm 1 is the original implementation +and is the more likely to prevent instructions from being reordered. +Algorithm 2 was designed to be a compromise between the relatively +conservative approach taken by algorithm 1 and the rather aggressive +approach taken by the default scheduler. It relies more heavily on +having a regular register file and accurate register pressure classes. +See <samp>haifa-sched.cc</samp> in the GCC sources for more details. +</p> +<p>The default choice depends on the target. +</p> +</dd> +<dt><code>max-slsr-cand-scan</code></dt> +<dd><p>Set the maximum number of existing candidates that are considered when +seeking a basis for a new straight-line strength reduction candidate. +</p> +</dd> +<dt><code>asan-globals</code></dt> +<dd><p>Enable buffer overflow detection for global objects. This kind +of protection is enabled by default if you are using +<samp>-fsanitize=address</samp> option. +To disable global objects protection use <samp>--param asan-globals=0</samp>. +</p> +</dd> +<dt><code>asan-stack</code></dt> +<dd><p>Enable buffer overflow detection for stack objects. This kind of +protection is enabled by default when using <samp>-fsanitize=address</samp>. +To disable stack protection use <samp>--param asan-stack=0</samp> option. +</p> +</dd> +<dt><code>asan-instrument-reads</code></dt> +<dd><p>Enable buffer overflow detection for memory reads. This kind of +protection is enabled by default when using <samp>-fsanitize=address</samp>. +To disable memory reads protection use +<samp>--param asan-instrument-reads=0</samp>. +</p> +</dd> +<dt><code>asan-instrument-writes</code></dt> +<dd><p>Enable buffer overflow detection for memory writes. This kind of +protection is enabled by default when using <samp>-fsanitize=address</samp>. +To disable memory writes protection use +<samp>--param asan-instrument-writes=0</samp> option. +</p> +</dd> +<dt><code>asan-memintrin</code></dt> +<dd><p>Enable detection for built-in functions. This kind of protection +is enabled by default when using <samp>-fsanitize=address</samp>. +To disable built-in functions protection use +<samp>--param asan-memintrin=0</samp>. +</p> +</dd> +<dt><code>asan-use-after-return</code></dt> +<dd><p>Enable detection of use-after-return. This kind of protection +is enabled by default when using the <samp>-fsanitize=address</samp> option. +To disable it use <samp>--param asan-use-after-return=0</samp>. +</p> +<p>Note: By default the check is disabled at run time. To enable it, +add <code>detect_stack_use_after_return=1</code> to the environment variable +<code>ASAN_OPTIONS</code>. +</p> +</dd> +<dt><code>asan-instrumentation-with-call-threshold</code></dt> +<dd><p>If number of memory accesses in function being instrumented +is greater or equal to this number, use callbacks instead of inline checks. +E.g. to disable inline code use +<samp>--param asan-instrumentation-with-call-threshold=0</samp>. +</p> +</dd> +<dt><code>asan-kernel-mem-intrinsic-prefix</code></dt> +<dd><p>If nonzero, prefix calls to <code>memcpy</code>, <code>memset</code> and <code>memmove</code> +with ‘<samp>__asan_</samp>’ or ‘<samp>__hwasan_</samp>’ +for <samp>-fsanitize=kernel-address</samp> or ‘<samp>-fsanitize=kernel-hwaddress</samp>’, +respectively. +</p> +</dd> +<dt><code>hwasan-instrument-stack</code></dt> +<dd><p>Enable hwasan instrumentation of statically sized stack-allocated variables. +This kind of instrumentation is enabled by default when using +<samp>-fsanitize=hwaddress</samp> and disabled by default when using +<samp>-fsanitize=kernel-hwaddress</samp>. +To disable stack instrumentation use +<samp>--param hwasan-instrument-stack=0</samp>, and to enable it use +<samp>--param hwasan-instrument-stack=1</samp>. +</p> +</dd> +<dt><code>hwasan-random-frame-tag</code></dt> +<dd><p>When using stack instrumentation, decide tags for stack variables using a +deterministic sequence beginning at a random tag for each frame. With this +parameter unset tags are chosen using the same sequence but beginning from 1. +This is enabled by default for <samp>-fsanitize=hwaddress</samp> and unavailable +for <samp>-fsanitize=kernel-hwaddress</samp>. +To disable it use <samp>--param hwasan-random-frame-tag=0</samp>. +</p> +</dd> +<dt><code>hwasan-instrument-allocas</code></dt> +<dd><p>Enable hwasan instrumentation of dynamically sized stack-allocated variables. +This kind of instrumentation is enabled by default when using +<samp>-fsanitize=hwaddress</samp> and disabled by default when using +<samp>-fsanitize=kernel-hwaddress</samp>. +To disable instrumentation of such variables use +<samp>--param hwasan-instrument-allocas=0</samp>, and to enable it use +<samp>--param hwasan-instrument-allocas=1</samp>. +</p> +</dd> +<dt><code>hwasan-instrument-reads</code></dt> +<dd><p>Enable hwasan checks on memory reads. Instrumentation of reads is enabled by +default for both <samp>-fsanitize=hwaddress</samp> and +<samp>-fsanitize=kernel-hwaddress</samp>. +To disable checking memory reads use +<samp>--param hwasan-instrument-reads=0</samp>. +</p> +</dd> +<dt><code>hwasan-instrument-writes</code></dt> +<dd><p>Enable hwasan checks on memory writes. Instrumentation of writes is enabled by +default for both <samp>-fsanitize=hwaddress</samp> and +<samp>-fsanitize=kernel-hwaddress</samp>. +To disable checking memory writes use +<samp>--param hwasan-instrument-writes=0</samp>. +</p> +</dd> +<dt><code>hwasan-instrument-mem-intrinsics</code></dt> +<dd><p>Enable hwasan instrumentation of builtin functions. Instrumentation of these +builtin functions is enabled by default for both <samp>-fsanitize=hwaddress</samp> +and <samp>-fsanitize=kernel-hwaddress</samp>. +To disable instrumentation of builtin functions use +<samp>--param hwasan-instrument-mem-intrinsics=0</samp>. +</p> +</dd> +<dt><code>use-after-scope-direct-emission-threshold</code></dt> +<dd><p>If the size of a local variable in bytes is smaller or equal to this +number, directly poison (or unpoison) shadow memory instead of using +run-time callbacks. +</p> +</dd> +<dt><code>tsan-distinguish-volatile</code></dt> +<dd><p>Emit special instrumentation for accesses to volatiles. +</p> +</dd> +<dt><code>tsan-instrument-func-entry-exit</code></dt> +<dd><p>Emit instrumentation calls to __tsan_func_entry() and __tsan_func_exit(). +</p> +</dd> +<dt><code>max-fsm-thread-path-insns</code></dt> +<dd><p>Maximum number of instructions to copy when duplicating blocks on a +finite state automaton jump thread path. +</p> +</dd> +<dt><code>threader-debug</code></dt> +<dd><p>threader-debug=[none|all] Enables verbose dumping of the threader solver. +</p> +</dd> +<dt><code>parloops-chunk-size</code></dt> +<dd><p>Chunk size of omp schedule for loops parallelized by parloops. +</p> +</dd> +<dt><code>parloops-schedule</code></dt> +<dd><p>Schedule type of omp schedule for loops parallelized by parloops (static, +dynamic, guided, auto, runtime). +</p> +</dd> +<dt><code>parloops-min-per-thread</code></dt> +<dd><p>The minimum number of iterations per thread of an innermost parallelized +loop for which the parallelized variant is preferred over the single threaded +one. Note that for a parallelized loop nest the +minimum number of iterations of the outermost loop per thread is two. +</p> +</dd> +<dt><code>max-ssa-name-query-depth</code></dt> +<dd><p>Maximum depth of recursion when querying properties of SSA names in things +like fold routines. One level of recursion corresponds to following a +use-def chain. +</p> +</dd> +<dt><code>max-speculative-devirt-maydefs</code></dt> +<dd><p>The maximum number of may-defs we analyze when looking for a must-def +specifying the dynamic type of an object that invokes a virtual call +we may be able to devirtualize speculatively. +</p> +</dd> +<dt><code>evrp-sparse-threshold</code></dt> +<dd><p>Maximum number of basic blocks before EVRP uses a sparse cache. +</p> +</dd> +<dt><code>ranger-debug</code></dt> +<dd><p>Specifies the type of debug output to be issued for ranges. +</p> +</dd> +<dt><code>evrp-switch-limit</code></dt> +<dd><p>Specifies the maximum number of switch cases before EVRP ignores a switch. +</p> +</dd> +<dt><code>unroll-jam-min-percent</code></dt> +<dd><p>The minimum percentage of memory references that must be optimized +away for the unroll-and-jam transformation to be considered profitable. +</p> +</dd> +<dt><code>unroll-jam-max-unroll</code></dt> +<dd><p>The maximum number of times the outer loop should be unrolled by +the unroll-and-jam transformation. +</p> +</dd> +<dt><code>max-rtl-if-conversion-unpredictable-cost</code></dt> +<dd><p>Maximum permissible cost for the sequence that would be generated +by the RTL if-conversion pass for a branch that is considered unpredictable. +</p> +</dd> +<dt><code>max-variable-expansions-in-unroller</code></dt> +<dd><p>If <samp>-fvariable-expansion-in-unroller</samp> is used, the maximum number +of times that an individual variable will be expanded during loop unrolling. +</p> +</dd> +<dt><code>partial-inlining-entry-probability</code></dt> +<dd><p>Maximum probability of the entry BB of split region +(in percent relative to entry BB of the function) +to make partial inlining happen. +</p> +</dd> +<dt><code>max-tracked-strlens</code></dt> +<dd><p>Maximum number of strings for which strlen optimization pass will +track string lengths. +</p> +</dd> +<dt><code>gcse-after-reload-partial-fraction</code></dt> +<dd><p>The threshold ratio for performing partial redundancy +elimination after reload. +</p> +</dd> +<dt><code>gcse-after-reload-critical-fraction</code></dt> +<dd><p>The threshold ratio of critical edges execution count that +permit performing redundancy elimination after reload. +</p> +</dd> +<dt><code>max-loop-header-insns</code></dt> +<dd><p>The maximum number of insns in loop header duplicated +by the copy loop headers pass. +</p> +</dd> +<dt><code>vect-epilogues-nomask</code></dt> +<dd><p>Enable loop epilogue vectorization using smaller vector size. +</p> +</dd> +<dt><code>vect-partial-vector-usage</code></dt> +<dd><p>Controls when the loop vectorizer considers using partial vector loads +and stores as an alternative to falling back to scalar code. 0 stops +the vectorizer from ever using partial vector loads and stores. 1 allows +partial vector loads and stores if vectorization removes the need for the +code to iterate. 2 allows partial vector loads and stores in all loops. +The parameter only has an effect on targets that support partial +vector loads and stores. +</p> +</dd> +<dt><code>vect-inner-loop-cost-factor</code></dt> +<dd><p>The maximum factor which the loop vectorizer applies to the cost of statements +in an inner loop relative to the loop being vectorized. The factor applied +is the maximum of the estimated number of iterations of the inner loop and +this parameter. The default value of this parameter is 50. +</p> +</dd> +<dt><code>vect-induction-float</code></dt> +<dd><p>Enable loop vectorization of floating point inductions. +</p> +</dd> +<dt><code>avoid-fma-max-bits</code></dt> +<dd><p>Maximum number of bits for which we avoid creating FMAs. +</p> +</dd> +<dt><code>sms-loop-average-count-threshold</code></dt> +<dd><p>A threshold on the average loop count considered by the swing modulo scheduler. +</p> +</dd> +<dt><code>sms-dfa-history</code></dt> +<dd><p>The number of cycles the swing modulo scheduler considers when checking +conflicts using DFA. +</p> +</dd> +<dt><code>graphite-allow-codegen-errors</code></dt> +<dd><p>Whether codegen errors should be ICEs when <samp>-fchecking</samp>. +</p> +</dd> +<dt><code>sms-max-ii-factor</code></dt> +<dd><p>A factor for tuning the upper bound that swing modulo scheduler +uses for scheduling a loop. +</p> +</dd> +<dt><code>lra-max-considered-reload-pseudos</code></dt> +<dd><p>The max number of reload pseudos which are considered during +spilling a non-reload pseudo. +</p> +</dd> +<dt><code>max-pow-sqrt-depth</code></dt> +<dd><p>Maximum depth of sqrt chains to use when synthesizing exponentiation +by a real constant. +</p> +</dd> +<dt><code>max-dse-active-local-stores</code></dt> +<dd><p>Maximum number of active local stores in RTL dead store elimination. +</p> +</dd> +<dt><code>asan-instrument-allocas</code></dt> +<dd><p>Enable asan allocas/VLAs protection. +</p> +</dd> +<dt><code>max-iterations-computation-cost</code></dt> +<dd><p>Bound on the cost of an expression to compute the number of iterations. +</p> +</dd> +<dt><code>max-isl-operations</code></dt> +<dd><p>Maximum number of isl operations, 0 means unlimited. +</p> +</dd> +<dt><code>graphite-max-arrays-per-scop</code></dt> +<dd><p>Maximum number of arrays per scop. +</p> +</dd> +<dt><code>max-vartrack-reverse-op-size</code></dt> +<dd><p>Max. size of loc list for which reverse ops should be added. +</p> +</dd> +<dt><code>fsm-scale-path-stmts</code></dt> +<dd><p>Scale factor to apply to the number of statements in a threading path +crossing a loop backedge when comparing to +<samp>--param=max-jump-thread-duplication-stmts</samp>. +</p> +</dd> +<dt><code>uninit-control-dep-attempts</code></dt> +<dd><p>Maximum number of nested calls to search for control dependencies +during uninitialized variable analysis. +</p> +</dd> +<dt><code>sched-autopref-queue-depth</code></dt> +<dd><p>Hardware autoprefetcher scheduler model control flag. +Number of lookahead cycles the model looks into; at ’ +’ only enable instruction sorting heuristic. +</p> +</dd> +<dt><code>loop-versioning-max-inner-insns</code></dt> +<dd><p>The maximum number of instructions that an inner loop can have +before the loop versioning pass considers it too big to copy. +</p> +</dd> +<dt><code>loop-versioning-max-outer-insns</code></dt> +<dd><p>The maximum number of instructions that an outer loop can have +before the loop versioning pass considers it too big to copy, +discounting any instructions in inner loops that directly benefit +from versioning. +</p> +</dd> +<dt><code>ssa-name-def-chain-limit</code></dt> +<dd><p>The maximum number of SSA_NAME assignments to follow in determining +a property of a variable such as its value. This limits the number +of iterations or recursive calls GCC performs when optimizing certain +statements or when determining their validity prior to issuing +diagnostics. +</p> +</dd> +<dt><code>store-merging-max-size</code></dt> +<dd><p>Maximum size of a single store merging region in bytes. +</p> +</dd> +<dt><code>hash-table-verification-limit</code></dt> +<dd><p>The number of elements for which hash table verification is done +for each searched element. +</p> +</dd> +<dt><code>max-find-base-term-values</code></dt> +<dd><p>Maximum number of VALUEs handled during a single find_base_term call. +</p> +</dd> +<dt><code>analyzer-max-enodes-per-program-point</code></dt> +<dd><p>The maximum number of exploded nodes per program point within +the analyzer, before terminating analysis of that point. +</p> +</dd> +<dt><code>analyzer-max-constraints</code></dt> +<dd><p>The maximum number of constraints per state. +</p> +</dd> +<dt><code>analyzer-min-snodes-for-call-summary</code></dt> +<dd><p>The minimum number of supernodes within a function for the +analyzer to consider summarizing its effects at call sites. +</p> +</dd> +<dt><code>analyzer-max-enodes-for-full-dump</code></dt> +<dd><p>The maximum depth of exploded nodes that should appear in a dot dump +before switching to a less verbose format. +</p> +</dd> +<dt><code>analyzer-max-recursion-depth</code></dt> +<dd><p>The maximum number of times a callsite can appear in a call stack +within the analyzer, before terminating analysis of a call that would +recurse deeper. +</p> +</dd> +<dt><code>analyzer-max-svalue-depth</code></dt> +<dd><p>The maximum depth of a symbolic value, before approximating +the value as unknown. +</p> +</dd> +<dt><code>analyzer-max-infeasible-edges</code></dt> +<dd><p>The maximum number of infeasible edges to reject before declaring +a diagnostic as infeasible. +</p> +</dd> +<dt><code>gimple-fe-computed-hot-bb-threshold</code></dt> +<dd><p>The number of executions of a basic block which is considered hot. +The parameter is used only in GIMPLE FE. +</p> +</dd> +<dt><code>analyzer-bb-explosion-factor</code></dt> +<dd><p>The maximum number of ’after supernode’ exploded nodes within the analyzer +per supernode, before terminating analysis. +</p> +</dd> +<dt><code>ranger-logical-depth</code></dt> +<dd><p>Maximum depth of logical expression evaluation ranger will look through +when evaluating outgoing edge ranges. +</p> +</dd> +<dt><code>ranger-recompute-depth</code></dt> +<dd><p>Maximum depth of instruction chains to consider for recomputation +in the outgoing range calculator. +</p> +</dd> +<dt><code>relation-block-limit</code></dt> +<dd><p>Maximum number of relations the oracle will register in a basic block. +</p> +</dd> +<dt><code>min-pagesize</code></dt> +<dd><p>Minimum page size for warning purposes. +</p> +</dd> +<dt><code>openacc-kernels</code></dt> +<dd><p>Specify mode of OpenACC ‘kernels’ constructs handling. +With <samp>--param=openacc-kernels=decompose</samp>, OpenACC ‘kernels’ +constructs are decomposed into parts, a sequence of compute +constructs, each then handled individually. +This is work in progress. +With <samp>--param=openacc-kernels=parloops</samp>, OpenACC ‘kernels’ +constructs are handled by the ‘<samp>parloops</samp>’ pass, en bloc. +This is the current default. +</p> +</dd> +<dt><code>openacc-privatization</code></dt> +<dd><p>Control whether the <samp>-fopt-info-omp-note</samp> and applicable +<samp>-fdump-tree-*-details</samp> options emit OpenACC privatization diagnostics. +With <samp>--param=openacc-privatization=quiet</samp>, don’t diagnose. +This is the current default. +With <samp>--param=openacc-privatization=noisy</samp>, do diagnose. +</p> +</dd> +</dl> + +<p>The following choices of <var>name</var> are available on AArch64 targets: +</p> +<dl compact="compact"> +<dt><code>aarch64-sve-compare-costs</code></dt> +<dd><p>When vectorizing for SVE, consider using “unpacked” vectors for +smaller elements and use the cost model to pick the cheapest approach. +Also use the cost model to choose between SVE and Advanced SIMD vectorization. +</p> +<p>Using unpacked vectors includes storing smaller elements in larger +containers and accessing elements with extending loads and truncating +stores. +</p> +</dd> +<dt><code>aarch64-float-recp-precision</code></dt> +<dd><p>The number of Newton iterations for calculating the reciprocal for float type. +The precision of division is proportional to this param when division +approximation is enabled. The default value is 1. +</p> +</dd> +<dt><code>aarch64-double-recp-precision</code></dt> +<dd><p>The number of Newton iterations for calculating the reciprocal for double type. +The precision of division is propotional to this param when division +approximation is enabled. The default value is 2. +</p> +</dd> +<dt><code>aarch64-autovec-preference</code></dt> +<dd><p>Force an ISA selection strategy for auto-vectorization. Accepts values from +0 to 4, inclusive. +</p><dl compact="compact"> +<dt>‘<samp>0</samp>’</dt> +<dd><p>Use the default heuristics. +</p></dd> +<dt>‘<samp>1</samp>’</dt> +<dd><p>Use only Advanced SIMD for auto-vectorization. +</p></dd> +<dt>‘<samp>2</samp>’</dt> +<dd><p>Use only SVE for auto-vectorization. +</p></dd> +<dt>‘<samp>3</samp>’</dt> +<dd><p>Use both Advanced SIMD and SVE. Prefer Advanced SIMD when the costs are +deemed equal. +</p></dd> +<dt>‘<samp>4</samp>’</dt> +<dd><p>Use both Advanced SIMD and SVE. Prefer SVE when the costs are deemed equal. +</p></dd> +</dl> +<p>The default value is 0. +</p> +</dd> +<dt><code>aarch64-loop-vect-issue-rate-niters</code></dt> +<dd><p>The tuning for some AArch64 CPUs tries to take both latencies and issue +rates into account when deciding whether a loop should be vectorized +using SVE, vectorized using Advanced SIMD, or not vectorized at all. +If this parameter is set to <var>n</var>, GCC will not use this heuristic +for loops that are known to execute in fewer than <var>n</var> Advanced +SIMD iterations. +</p> +</dd> +<dt><code>aarch64-vect-unroll-limit</code></dt> +<dd><p>The vectorizer will use available tuning information to determine whether it +would be beneficial to unroll the main vectorized loop and by how much. This +parameter set’s the upper bound of how much the vectorizer will unroll the main +loop. The default value is four. +</p> +</dd> +</dl> + +<p>The following choices of <var>name</var> are available on i386 and x86_64 targets: +</p> +<dl compact="compact"> +<dt><code>x86-stlf-window-ninsns</code></dt> +<dd><p>Instructions number above which STFL stall penalty can be compensated. +</p> +</dd> +<dt><code>x86-stv-max-visits</code></dt> +<dd><p>The maximum number of use and def visits when discovering a STV chain before +the discovery is aborted. +</p> +</dd> +</dl> + +</dd> +</dl> + +<hr> +<div class="header"> +<p> +Next: <a href="Instrumentation-Options.html#Instrumentation-Options" accesskey="n" rel="next">Instrumentation Options</a>, Previous: <a href="Debugging-Options.html#Debugging-Options" accesskey="p" rel="previous">Debugging Options</a>, Up: <a href="Invoking-GCC.html#Invoking-GCC" accesskey="u" rel="up">Invoking GCC</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Indices.html#Indices" title="Index" rel="index">Index</a>]</p> +</div> + + + +</body> +</html> |