diff options
Diffstat (limited to 'share/doc/gccint/RTL-passes.html')
-rw-r--r-- | share/doc/gccint/RTL-passes.html | 356 |
1 files changed, 356 insertions, 0 deletions
diff --git a/share/doc/gccint/RTL-passes.html b/share/doc/gccint/RTL-passes.html new file mode 100644 index 0000000..0c1cba9 --- /dev/null +++ b/share/doc/gccint/RTL-passes.html @@ -0,0 +1,356 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> +<html> +<!-- Copyright (C) 1988-2023 Free Software Foundation, Inc. + +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation; with the +Invariant Sections being "Funding Free Software", the Front-Cover +Texts being (a) (see below), and with the Back-Cover Texts being (b) +(see below). A copy of the license is included in the section entitled +"GNU Free Documentation License". + +(a) The FSF's Front-Cover Text is: + +A GNU Manual + +(b) The FSF's Back-Cover Text is: + +You have freedom to copy and modify this GNU Manual, like GNU + software. Copies published by the Free Software Foundation raise + funds for GNU development. --> +<!-- Created by GNU Texinfo 5.1, http://www.gnu.org/software/texinfo/ --> +<head> +<title>GNU Compiler Collection (GCC) Internals: RTL passes</title> + +<meta name="description" content="GNU Compiler Collection (GCC) Internals: RTL passes"> +<meta name="keywords" content="GNU Compiler Collection (GCC) Internals: RTL passes"> +<meta name="resource-type" content="document"> +<meta name="distribution" content="global"> +<meta name="Generator" content="makeinfo"> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> +<link href="index.html#Top" rel="start" title="Top"> +<link href="Option-Index.html#Option-Index" rel="index" title="Option Index"> +<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents"> +<link href="Passes.html#Passes" rel="up" title="Passes"> +<link href="Optimization-info.html#Optimization-info" rel="next" title="Optimization info"> +<link href="Tree-SSA-passes.html#Tree-SSA-passes" rel="previous" title="Tree SSA passes"> +<style type="text/css"> +<!-- +a.summary-letter {text-decoration: none} +blockquote.smallquotation {font-size: smaller} +div.display {margin-left: 3.2em} +div.example {margin-left: 3.2em} +div.indentedblock {margin-left: 3.2em} +div.lisp {margin-left: 3.2em} +div.smalldisplay {margin-left: 3.2em} +div.smallexample {margin-left: 3.2em} +div.smallindentedblock {margin-left: 3.2em; font-size: smaller} +div.smalllisp {margin-left: 3.2em} +kbd {font-style:oblique} +pre.display {font-family: inherit} +pre.format {font-family: inherit} +pre.menu-comment {font-family: serif} +pre.menu-preformatted {font-family: serif} +pre.smalldisplay {font-family: inherit; font-size: smaller} +pre.smallexample {font-size: smaller} +pre.smallformat {font-family: inherit; font-size: smaller} +pre.smalllisp {font-size: smaller} +span.nocodebreak {white-space:nowrap} +span.nolinebreak {white-space:nowrap} +span.roman {font-family:serif; font-weight:normal} +span.sansserif {font-family:sans-serif; font-weight:normal} +ul.no-bullet {list-style: none} +--> +</style> + + +</head> + +<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000"> +<a name="RTL-passes"></a> +<div class="header"> +<p> +Next: <a href="Optimization-info.html#Optimization-info" accesskey="n" rel="next">Optimization info</a>, Previous: <a href="Tree-SSA-passes.html#Tree-SSA-passes" accesskey="p" rel="previous">Tree SSA passes</a>, Up: <a href="Passes.html#Passes" accesskey="u" rel="up">Passes</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Option-Index.html#Option-Index" title="Index" rel="index">Index</a>]</p> +</div> +<hr> +<a name="RTL-passes-1"></a> +<h3 class="section">9.6 RTL passes</h3> + +<p>The following briefly describes the RTL generation and optimization +passes that are run after the Tree optimization passes. +</p> +<ul> +<li> RTL generation + +<p>The source files for RTL generation include +<samp>stmt.cc</samp>, +<samp>calls.cc</samp>, +<samp>expr.cc</samp>, +<samp>explow.cc</samp>, +<samp>expmed.cc</samp>, +<samp>function.cc</samp>, +<samp>optabs.cc</samp> +and <samp>emit-rtl.cc</samp>. +Also, the file +<samp>insn-emit.cc</samp>, generated from the machine description by the +program <code>genemit</code>, is used in this pass. The header file +<samp>expr.h</samp> is used for communication within this pass. +</p> +<a name="index-genflags"></a> +<a name="index-gencodes"></a> +<p>The header files <samp>insn-flags.h</samp> and <samp>insn-codes.h</samp>, +generated from the machine description by the programs <code>genflags</code> +and <code>gencodes</code>, tell this pass which standard names are available +for use and which patterns correspond to them. +</p> +</li><li> Generation of exception landing pads + +<p>This pass generates the glue that handles communication between the +exception handling library routines and the exception handlers within +the function. Entry points in the function that are invoked by the +exception handling library are called <em>landing pads</em>. The code +for this pass is located in <samp>except.cc</samp>. +</p> +</li><li> Control flow graph cleanup + +<p>This pass removes unreachable code, simplifies jumps to next, jumps to +jump, jumps across jumps, etc. The pass is run multiple times. +For historical reasons, it is occasionally referred to as the “jump +optimization pass”. The bulk of the code for this pass is in +<samp>cfgcleanup.cc</samp>, and there are support routines in <samp>cfgrtl.cc</samp> +and <samp>jump.cc</samp>. +</p> +</li><li> Forward propagation of single-def values + +<p>This pass attempts to remove redundant computation by substituting +variables that come from a single definition, and +seeing if the result can be simplified. It performs copy propagation +and addressing mode selection. The pass is run twice, with values +being propagated into loops only on the second run. The code is +located in <samp>fwprop.cc</samp>. +</p> +</li><li> Common subexpression elimination + +<p>This pass removes redundant computation within basic blocks, and +optimizes addressing modes based on cost. The pass is run twice. +The code for this pass is located in <samp>cse.cc</samp>. +</p> +</li><li> Global common subexpression elimination + +<p>This pass performs two +different types of GCSE depending on whether you are optimizing for +size or not (LCM based GCSE tends to increase code size for a gain in +speed, while Morel-Renvoise based GCSE does not). +When optimizing for size, GCSE is done using Morel-Renvoise Partial +Redundancy Elimination, with the exception that it does not try to move +invariants out of loops—that is left to the loop optimization pass. +If MR PRE GCSE is done, code hoisting (aka unification) is also done, as +well as load motion. +If you are optimizing for speed, LCM (lazy code motion) based GCSE is +done. LCM is based on the work of Knoop, Ruthing, and Steffen. LCM +based GCSE also does loop invariant code motion. We also perform load +and store motion when optimizing for speed. +Regardless of which type of GCSE is used, the GCSE pass also performs +global constant and copy propagation. +The source file for this pass is <samp>gcse.cc</samp>, and the LCM routines +are in <samp>lcm.cc</samp>. +</p> +</li><li> Loop optimization + +<p>This pass performs several loop related optimizations. +The source files <samp>cfgloopanal.cc</samp> and <samp>cfgloopmanip.cc</samp> contain +generic loop analysis and manipulation code. Initialization and finalization +of loop structures is handled by <samp>loop-init.cc</samp>. +A loop invariant motion pass is implemented in <samp>loop-invariant.cc</samp>. +Basic block level optimizations—unrolling, and peeling loops— +are implemented in <samp>loop-unroll.cc</samp>. +Replacing of the exit condition of loops by special machine-dependent +instructions is handled by <samp>loop-doloop.cc</samp>. +</p> +</li><li> Jump bypassing + +<p>This pass is an aggressive form of GCSE that transforms the control +flow graph of a function by propagating constants into conditional +branch instructions. The source file for this pass is <samp>gcse.cc</samp>. +</p> +</li><li> If conversion + +<p>This pass attempts to replace conditional branches and surrounding +assignments with arithmetic, boolean value producing comparison +instructions, and conditional move instructions. In the very last +invocation after reload/LRA, it will generate predicated instructions +when supported by the target. The code is located in <samp>ifcvt.cc</samp>. +</p> +</li><li> Web construction + +<p>This pass splits independent uses of each pseudo-register. This can +improve effect of the other transformation, such as CSE or register +allocation. The code for this pass is located in <samp>web.cc</samp>. +</p> +</li><li> Instruction combination + +<p>This pass attempts to combine groups of two or three instructions that +are related by data flow into single instructions. It combines the +RTL expressions for the instructions by substitution, simplifies the +result using algebra, and then attempts to match the result against +the machine description. The code is located in <samp>combine.cc</samp>. +</p> +</li><li> Mode switching optimization + +<p>This pass looks for instructions that require the processor to be in a +specific “mode” and minimizes the number of mode changes required to +satisfy all users. What these modes are, and what they apply to are +completely target-specific. The code for this pass is located in +<samp>mode-switching.cc</samp>. +</p> +</li><li> <a name="index-modulo-scheduling"></a> +<a name="index-sms_002c-swing_002c-software-pipelining"></a> +Modulo scheduling + +<p>This pass looks at innermost loops and reorders their instructions +by overlapping different iterations. Modulo scheduling is performed +immediately before instruction scheduling. The code for this pass is +located in <samp>modulo-sched.cc</samp>. +</p> +</li><li> Instruction scheduling + +<p>This pass looks for instructions whose output will not be available by +the time that it is used in subsequent instructions. Memory loads and +floating point instructions often have this behavior on RISC machines. +It re-orders instructions within a basic block to try to separate the +definition and use of items that otherwise would cause pipeline +stalls. This pass is performed twice, before and after register +allocation. The code for this pass is located in <samp>haifa-sched.cc</samp>, +<samp>sched-deps.cc</samp>, <samp>sched-ebb.cc</samp>, <samp>sched-rgn.cc</samp> and +<samp>sched-vis.c</samp>. +</p> +</li><li> Register allocation + +<p>These passes make sure that all occurrences of pseudo registers are +eliminated, either by allocating them to a hard register, replacing +them by an equivalent expression (e.g. a constant) or by placing +them on the stack. This is done in several subpasses: +</p> +<ul> +<li> The integrated register allocator (<acronym>IRA</acronym>). It is called +integrated because coalescing, register live range splitting, and hard +register preferencing are done on-the-fly during coloring. It also +has better integration with the reload/LRA pass. Pseudo-registers spilled +by the allocator or the reload/LRA have still a chance to get +hard-registers if the reload/LRA evicts some pseudo-registers from +hard-registers. The allocator helps to choose better pseudos for +spilling based on their live ranges and to coalesce stack slots +allocated for the spilled pseudo-registers. IRA is a regional +register allocator which is transformed into Chaitin-Briggs allocator +if there is one region. By default, IRA chooses regions using +register pressure but the user can force it to use one region or +regions corresponding to all loops. + +<p>Source files of the allocator are <samp>ira.cc</samp>, <samp>ira-build.cc</samp>, +<samp>ira-costs.cc</samp>, <samp>ira-conflicts.cc</samp>, <samp>ira-color.cc</samp>, +<samp>ira-emit.cc</samp>, <samp>ira-lives</samp>, plus header files <samp>ira.h</samp> +and <samp>ira-int.h</samp> used for the communication between the allocator +and the rest of the compiler and between the IRA files. +</p> +</li><li> <a name="index-reloading"></a> +Reloading. This pass renumbers pseudo registers with the hardware +registers numbers they were allocated. Pseudo registers that did not +get hard registers are replaced with stack slots. Then it finds +instructions that are invalid because a value has failed to end up in +a register, or has ended up in a register of the wrong kind. It fixes +up these instructions by reloading the problematical values +temporarily into registers. Additional instructions are generated to +do the copying. + +<p>The reload pass also optionally eliminates the frame pointer and inserts +instructions to save and restore call-clobbered registers around calls. +</p> +<p>Source files are <samp>reload.cc</samp> and <samp>reload1.cc</samp>, plus the header +<samp>reload.h</samp> used for communication between them. +</p> +</li><li> <a name="index-Local-Register-Allocator-_0028LRA_0029"></a> +This pass is a modern replacement of the reload pass. Source files +are <samp>lra.cc</samp>, <samp>lra-assign.c</samp>, <samp>lra-coalesce.cc</samp>, +<samp>lra-constraints.cc</samp>, <samp>lra-eliminations.cc</samp>, +<samp>lra-lives.cc</samp>, <samp>lra-remat.cc</samp>, <samp>lra-spills.cc</samp>, the +header <samp>lra-int.h</samp> used for communication between them, and the +header <samp>lra.h</samp> used for communication between LRA and the rest of +compiler. + +<p>Unlike the reload pass, intermediate LRA decisions are reflected in +RTL as much as possible. This reduces the number of target-dependent +macros and hooks, leaving instruction constraints as the primary +source of control. +</p> +<p>LRA is run on targets for which TARGET_LRA_P returns true. +</p></li></ul> + +</li><li> Basic block reordering + +<p>This pass implements profile guided code positioning. If profile +information is not available, various types of static analysis are +performed to make the predictions normally coming from the profile +feedback (IE execution frequency, branch probability, etc). It is +implemented in the file <samp>bb-reorder.cc</samp>, and the various +prediction routines are in <samp>predict.cc</samp>. +</p> +</li><li> Variable tracking + +<p>This pass computes where the variables are stored at each +position in code and generates notes describing the variable locations +to RTL code. The location lists are then generated according to these +notes to debug information if the debugging information format supports +location lists. The code is located in <samp>var-tracking.cc</samp>. +</p> +</li><li> Delayed branch scheduling + +<p>This optional pass attempts to find instructions that can go into the +delay slots of other instructions, usually jumps and calls. The code +for this pass is located in <samp>reorg.cc</samp>. +</p> +</li><li> Branch shortening + +<p>On many RISC machines, branch instructions have a limited range. +Thus, longer sequences of instructions must be used for long branches. +In this pass, the compiler figures out what how far each instruction +will be from each other instruction, and therefore whether the usual +instructions, or the longer sequences, must be used for each branch. +The code for this pass is located in <samp>final.cc</samp>. +</p> +</li><li> Register-to-stack conversion + +<p>Conversion from usage of some hard registers to usage of a register +stack may be done at this point. Currently, this is supported only +for the floating-point registers of the Intel 80387 coprocessor. The +code for this pass is located in <samp>reg-stack.cc</samp>. +</p> +</li><li> Final + +<p>This pass outputs the assembler code for the function. The source files +are <samp>final.cc</samp> plus <samp>insn-output.cc</samp>; the latter is generated +automatically from the machine description by the tool <samp>genoutput</samp>. +The header file <samp>conditions.h</samp> is used for communication between +these files. +</p> +</li><li> Debugging information output + +<p>This is run after final because it must output the stack slot offsets +for pseudo registers that did not get hard registers. Source files +are <samp>dwarfout.c</samp> for +DWARF symbol table format, files <samp>dwarf2out.cc</samp> and <samp>dwarf2asm.cc</samp> +for DWARF2 symbol table format, and <samp>vmsdbgout.cc</samp> for VMS debug +symbol table format. +</p> +</li></ul> + +<hr> +<div class="header"> +<p> +Next: <a href="Optimization-info.html#Optimization-info" accesskey="n" rel="next">Optimization info</a>, Previous: <a href="Tree-SSA-passes.html#Tree-SSA-passes" accesskey="p" rel="previous">Tree SSA passes</a>, Up: <a href="Passes.html#Passes" accesskey="u" rel="up">Passes</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Option-Index.html#Option-Index" title="Index" rel="index">Index</a>]</p> +</div> + + + +</body> +</html> |