diff options
author | alk3pInjection <webmaster@raspii.tech> | 2024-02-04 16:16:35 +0800 |
---|---|---|
committer | alk3pInjection <webmaster@raspii.tech> | 2024-02-04 16:16:35 +0800 |
commit | abdaadbcae30fe0c9a66c7516798279fdfd97750 (patch) | |
tree | 00a54a6e25601e43876d03c1a4a12a749d4a914c /share/doc/cpp/Tokenization.html |
https://developer.arm.com/downloads/-/arm-gnu-toolchain-downloads
Change-Id: I7303388733328cd98ab9aa3c30236db67f2e9e9c
Diffstat (limited to 'share/doc/cpp/Tokenization.html')
-rw-r--r-- | share/doc/cpp/Tokenization.html | 251 |
1 files changed, 251 insertions, 0 deletions
diff --git a/share/doc/cpp/Tokenization.html b/share/doc/cpp/Tokenization.html new file mode 100644 index 0000000..d997eeb --- /dev/null +++ b/share/doc/cpp/Tokenization.html @@ -0,0 +1,251 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> +<html> +<!-- Copyright (C) 1987-2023 Free Software Foundation, Inc. + +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation. A copy of +the license is included in the +section entitled "GNU Free Documentation License". + +This manual contains no Invariant Sections. The Front-Cover Texts are +(a) (see below), and the Back-Cover Texts are (b) (see below). + +(a) The FSF's Front-Cover Text is: + +A GNU Manual + +(b) The FSF's Back-Cover Text is: + +You have freedom to copy and modify this GNU Manual, like GNU + software. Copies published by the Free Software Foundation raise + funds for GNU development. --> +<!-- Created by GNU Texinfo 5.1, http://www.gnu.org/software/texinfo/ --> +<head> +<title>The C Preprocessor: Tokenization</title> + +<meta name="description" content="The C Preprocessor: Tokenization"> +<meta name="keywords" content="The C Preprocessor: Tokenization"> +<meta name="resource-type" content="document"> +<meta name="distribution" content="global"> +<meta name="Generator" content="makeinfo"> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> +<link href="index.html#Top" rel="start" title="Top"> +<link href="Index-of-Directives.html#Index-of-Directives" rel="index" title="Index of Directives"> +<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents"> +<link href="Overview.html#Overview" rel="up" title="Overview"> +<link href="The-preprocessing-language.html#The-preprocessing-language" rel="next" title="The preprocessing language"> +<link href="Initial-processing.html#Initial-processing" rel="previous" title="Initial processing"> +<style type="text/css"> +<!-- +a.summary-letter {text-decoration: none} +blockquote.smallquotation {font-size: smaller} +div.display {margin-left: 3.2em} +div.example {margin-left: 3.2em} +div.indentedblock {margin-left: 3.2em} +div.lisp {margin-left: 3.2em} +div.smalldisplay {margin-left: 3.2em} +div.smallexample {margin-left: 3.2em} +div.smallindentedblock {margin-left: 3.2em; font-size: smaller} +div.smalllisp {margin-left: 3.2em} +kbd {font-style:oblique} +pre.display {font-family: inherit} +pre.format {font-family: inherit} +pre.menu-comment {font-family: serif} +pre.menu-preformatted {font-family: serif} +pre.smalldisplay {font-family: inherit; font-size: smaller} +pre.smallexample {font-size: smaller} +pre.smallformat {font-family: inherit; font-size: smaller} +pre.smalllisp {font-size: smaller} +span.nocodebreak {white-space:nowrap} +span.nolinebreak {white-space:nowrap} +span.roman {font-family:serif; font-weight:normal} +span.sansserif {font-family:sans-serif; font-weight:normal} +ul.no-bullet {list-style: none} +--> +</style> + + +</head> + +<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000"> +<a name="Tokenization"></a> +<div class="header"> +<p> +Next: <a href="The-preprocessing-language.html#The-preprocessing-language" accesskey="n" rel="next">The preprocessing language</a>, Previous: <a href="Initial-processing.html#Initial-processing" accesskey="p" rel="previous">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p> +</div> +<hr> +<a name="Tokenization-1"></a> +<h3 class="section">1.3 Tokenization</h3> + +<a name="index-tokens"></a> +<a name="index-preprocessing-tokens"></a> +<p>After the textual transformations are finished, the input file is +converted into a sequence of <em>preprocessing tokens</em>. These mostly +correspond to the syntactic tokens used by the C compiler, but there are +a few differences. White space separates tokens; it is not itself a +token of any kind. Tokens do not have to be separated by white space, +but it is often necessary to avoid ambiguities. +</p> +<p>When faced with a sequence of characters that has more than one possible +tokenization, the preprocessor is greedy. It always makes each token, +starting from the left, as big as possible before moving on to the next +token. For instance, <code>a+++++b</code> is interpreted as +<code>a ++ ++ + b<!-- /@w --></code>, not as <code>a ++ + ++ b<!-- /@w --></code>, even though the +latter tokenization could be part of a valid C program and the former +could not. +</p> +<p>Once the input file is broken into tokens, the token boundaries never +change, except when the ‘<samp>##</samp>’ preprocessing operator is used to paste +tokens together. See <a href="Concatenation.html#Concatenation">Concatenation</a>. For example, +</p> +<div class="smallexample"> +<pre class="smallexample">#define foo() bar +foo()baz + → bar baz +<em>not</em> + → barbaz +</pre></div> + +<p>The compiler does not re-tokenize the preprocessor’s output. Each +preprocessing token becomes one compiler token. +</p> +<a name="index-identifiers"></a> +<p>Preprocessing tokens fall into five broad classes: identifiers, +preprocessing numbers, string literals, punctuators, and other. An +<em>identifier</em> is the same as an identifier in C: any sequence of +letters, digits, or underscores, which begins with a letter or +underscore. Keywords of C have no significance to the preprocessor; +they are ordinary identifiers. You can define a macro whose name is a +keyword, for instance. The only identifier which can be considered a +preprocessing keyword is <code>defined</code>. See <a href="Defined.html#Defined">Defined</a>. +</p> +<p>This is mostly true of other languages which use the C preprocessor. +However, a few of the keywords of C++ are significant even in the +preprocessor. See <a href="C_002b_002b-Named-Operators.html#C_002b_002b-Named-Operators">C++ Named Operators</a>. +</p> +<p>In the 1999 C standard, identifiers may contain letters which are not +part of the “basic source character set”, at the implementation’s +discretion (such as accented Latin letters, Greek letters, or Chinese +ideograms). This may be done with an extended character set, or the +‘<samp>\u</samp>’ and ‘<samp>\U</samp>’ escape sequences. +</p> +<p>As an extension, GCC treats ‘<samp>$</samp>’ as a letter. This is for +compatibility with some systems, such as VMS, where ‘<samp>$</samp>’ is commonly +used in system-defined function and object names. ‘<samp>$</samp>’ is not a +letter in strictly conforming mode, or if you specify the <samp>-$</samp> +option. See <a href="Invocation.html#Invocation">Invocation</a>. +</p> +<a name="index-numbers"></a> +<a name="index-preprocessing-numbers"></a> +<p>A <em>preprocessing number</em> has a rather bizarre definition. The +category includes all the normal integer and floating point constants +one expects of C, but also a number of other things one might not +initially recognize as a number. Formally, preprocessing numbers begin +with an optional period, a required decimal digit, and then continue +with any sequence of letters, digits, underscores, periods, and +exponents. Exponents are the two-character sequences ‘<samp>e+</samp>’, +‘<samp>e-</samp>’, ‘<samp>E+</samp>’, ‘<samp>E-</samp>’, ‘<samp>p+</samp>’, ‘<samp>p-</samp>’, ‘<samp>P+</samp>’, and +‘<samp>P-</samp>’. (The exponents that begin with ‘<samp>p</samp>’ or ‘<samp>P</samp>’ are +used for hexadecimal floating-point constants.) +</p> +<p>The purpose of this unusual definition is to isolate the preprocessor +from the full complexity of numeric constants. It does not have to +distinguish between lexically valid and invalid floating-point numbers, +which is complicated. The definition also permits you to split an +identifier at any position and get exactly two tokens, which can then be +pasted back together with the ‘<samp>##</samp>’ operator. +</p> +<p>It’s possible for preprocessing numbers to cause programs to be +misinterpreted. For example, <code>0xE+12</code> is a preprocessing number +which does not translate to any valid numeric constant, therefore a +syntax error. It does not mean <code>0xE + 12<!-- /@w --></code>, which is what you +might have intended. +</p> +<a name="index-string-literals"></a> +<a name="index-string-constants"></a> +<a name="index-character-constants"></a> +<a name="index-header-file-names"></a> +<p><em>String literals</em> are string constants, character constants, and +header file names (the argument of ‘<samp>#include</samp>’).<a name="DOCF2" href="#FOOT2"><sup>2</sup></a> String constants and character +constants are straightforward: <tt>"…"</tt> or <tt>'…'</tt>. In +either case embedded quotes should be escaped with a backslash: +<tt>'\''</tt> is the character constant for ‘<samp>'</samp>’. There is no limit on +the length of a character constant, but the value of a character +constant that contains more than one character is +implementation-defined. See <a href="Implementation-Details.html#Implementation-Details">Implementation Details</a>. +</p> +<p>Header file names either look like string constants, <tt>"…"</tt>, or are +written with angle brackets instead, <tt><…></tt>. In either case, +backslash is an ordinary character. There is no way to escape the +closing quote or angle bracket. The preprocessor looks for the header +file in different places depending on which form you use. See <a href="Include-Operation.html#Include-Operation">Include Operation</a>. +</p> +<p>No string literal may extend past the end of a line. You may use continued +lines instead, or string constant concatenation. +</p> +<a name="index-punctuators"></a> +<a name="index-digraphs"></a> +<a name="index-alternative-tokens"></a> +<p><em>Punctuators</em> are all the usual bits of punctuation which are +meaningful to C and C++. All but three of the punctuation characters in +ASCII are C punctuators. The exceptions are ‘<samp>@</samp>’, ‘<samp>$</samp>’, and +‘<samp>`</samp>’. In addition, all the two- and three-character operators are +punctuators. There are also six <em>digraphs</em>, which the C++ standard +calls <em>alternative tokens</em>, which are merely alternate ways to spell +other punctuators. This is a second attempt to work around missing +punctuation in obsolete systems. It has no negative side effects, +unlike trigraphs, but does not cover as much ground. The digraphs and +their corresponding normal punctuators are: +</p> +<div class="smallexample"> +<pre class="smallexample">Digraph: <% %> <: :> %: %:%: +Punctuator: { } [ ] # ## +</pre></div> + +<a name="index-other-tokens"></a> +<p>Any other single byte is considered “other” and passed on to the +preprocessor’s output unchanged. The C compiler will almost certainly +reject source code containing “other” tokens. In ASCII, the only +“other” characters are ‘<samp>@</samp>’, ‘<samp>$</samp>’, ‘<samp>`</samp>’, and control +characters other than NUL (all bits zero). (Note that ‘<samp>$</samp>’ is +normally considered a letter.) All bytes with the high bit set +(numeric range 0x7F–0xFF) that were not succesfully interpreted as +part of an extended character in the input encoding are also “other” +in the present implementation. +</p> +<p>NUL is a special case because of the high probability that its +appearance is accidental, and because it may be invisible to the user +(many terminals do not display NUL at all). Within comments, NULs are +silently ignored, just as any other character would be. In running +text, NUL is considered white space. For example, these two directives +have the same meaning. +</p> +<div class="smallexample"> +<pre class="smallexample">#define X^@1 +#define X 1 +</pre></div> + +<p>(where ‘<samp>^@</samp>’ is ASCII NUL). Within string or character constants, +NULs are preserved. In the latter two cases the preprocessor emits a +warning message. +</p> +<div class="footnote"> +<hr> +<h4 class="footnotes-heading">Footnotes</h4> + +<h3><a name="FOOT2" href="#DOCF2">(2)</a></h3> +<p>The C +standard uses the term <em>string literal</em> to refer only to what we are +calling <em>string constants</em>.</p> +</div> +<hr> +<div class="header"> +<p> +Next: <a href="The-preprocessing-language.html#The-preprocessing-language" accesskey="n" rel="next">The preprocessing language</a>, Previous: <a href="Initial-processing.html#Initial-processing" accesskey="p" rel="previous">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p> +</div> + + + +</body> +</html> |