summaryrefslogtreecommitdiff
path: root/share/doc/cpp/Tokenization.html
diff options
context:
space:
mode:
authoralk3pInjection <webmaster@raspii.tech>2024-02-04 16:16:35 +0800
committeralk3pInjection <webmaster@raspii.tech>2024-02-04 16:16:35 +0800
commitabdaadbcae30fe0c9a66c7516798279fdfd97750 (patch)
tree00a54a6e25601e43876d03c1a4a12a749d4a914c /share/doc/cpp/Tokenization.html
Import stripped Arm GNU Toolchain 13.2.Rel1HEADumineko
https://developer.arm.com/downloads/-/arm-gnu-toolchain-downloads Change-Id: I7303388733328cd98ab9aa3c30236db67f2e9e9c
Diffstat (limited to 'share/doc/cpp/Tokenization.html')
-rw-r--r--share/doc/cpp/Tokenization.html251
1 files changed, 251 insertions, 0 deletions
diff --git a/share/doc/cpp/Tokenization.html b/share/doc/cpp/Tokenization.html
new file mode 100644
index 0000000..d997eeb
--- /dev/null
+++ b/share/doc/cpp/Tokenization.html
@@ -0,0 +1,251 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<!-- Copyright (C) 1987-2023 Free Software Foundation, Inc.
+
+Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.3 or
+any later version published by the Free Software Foundation. A copy of
+the license is included in the
+section entitled "GNU Free Documentation License".
+
+This manual contains no Invariant Sections. The Front-Cover Texts are
+(a) (see below), and the Back-Cover Texts are (b) (see below).
+
+(a) The FSF's Front-Cover Text is:
+
+A GNU Manual
+
+(b) The FSF's Back-Cover Text is:
+
+You have freedom to copy and modify this GNU Manual, like GNU
+ software. Copies published by the Free Software Foundation raise
+ funds for GNU development. -->
+<!-- Created by GNU Texinfo 5.1, http://www.gnu.org/software/texinfo/ -->
+<head>
+<title>The C Preprocessor: Tokenization</title>
+
+<meta name="description" content="The C Preprocessor: Tokenization">
+<meta name="keywords" content="The C Preprocessor: Tokenization">
+<meta name="resource-type" content="document">
+<meta name="distribution" content="global">
+<meta name="Generator" content="makeinfo">
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+<link href="index.html#Top" rel="start" title="Top">
+<link href="Index-of-Directives.html#Index-of-Directives" rel="index" title="Index of Directives">
+<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
+<link href="Overview.html#Overview" rel="up" title="Overview">
+<link href="The-preprocessing-language.html#The-preprocessing-language" rel="next" title="The preprocessing language">
+<link href="Initial-processing.html#Initial-processing" rel="previous" title="Initial processing">
+<style type="text/css">
+<!--
+a.summary-letter {text-decoration: none}
+blockquote.smallquotation {font-size: smaller}
+div.display {margin-left: 3.2em}
+div.example {margin-left: 3.2em}
+div.indentedblock {margin-left: 3.2em}
+div.lisp {margin-left: 3.2em}
+div.smalldisplay {margin-left: 3.2em}
+div.smallexample {margin-left: 3.2em}
+div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
+div.smalllisp {margin-left: 3.2em}
+kbd {font-style:oblique}
+pre.display {font-family: inherit}
+pre.format {font-family: inherit}
+pre.menu-comment {font-family: serif}
+pre.menu-preformatted {font-family: serif}
+pre.smalldisplay {font-family: inherit; font-size: smaller}
+pre.smallexample {font-size: smaller}
+pre.smallformat {font-family: inherit; font-size: smaller}
+pre.smalllisp {font-size: smaller}
+span.nocodebreak {white-space:nowrap}
+span.nolinebreak {white-space:nowrap}
+span.roman {font-family:serif; font-weight:normal}
+span.sansserif {font-family:sans-serif; font-weight:normal}
+ul.no-bullet {list-style: none}
+-->
+</style>
+
+
+</head>
+
+<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
+<a name="Tokenization"></a>
+<div class="header">
+<p>
+Next: <a href="The-preprocessing-language.html#The-preprocessing-language" accesskey="n" rel="next">The preprocessing language</a>, Previous: <a href="Initial-processing.html#Initial-processing" accesskey="p" rel="previous">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p>
+</div>
+<hr>
+<a name="Tokenization-1"></a>
+<h3 class="section">1.3 Tokenization</h3>
+
+<a name="index-tokens"></a>
+<a name="index-preprocessing-tokens"></a>
+<p>After the textual transformations are finished, the input file is
+converted into a sequence of <em>preprocessing tokens</em>. These mostly
+correspond to the syntactic tokens used by the C compiler, but there are
+a few differences. White space separates tokens; it is not itself a
+token of any kind. Tokens do not have to be separated by white space,
+but it is often necessary to avoid ambiguities.
+</p>
+<p>When faced with a sequence of characters that has more than one possible
+tokenization, the preprocessor is greedy. It always makes each token,
+starting from the left, as big as possible before moving on to the next
+token. For instance, <code>a+++++b</code> is interpreted as
+<code>a&nbsp;++&nbsp;++&nbsp;+&nbsp;b<!-- /@w --></code>, not as <code>a&nbsp;++&nbsp;+&nbsp;++&nbsp;b<!-- /@w --></code>, even though the
+latter tokenization could be part of a valid C program and the former
+could not.
+</p>
+<p>Once the input file is broken into tokens, the token boundaries never
+change, except when the &lsquo;<samp>##</samp>&rsquo; preprocessing operator is used to paste
+tokens together. See <a href="Concatenation.html#Concatenation">Concatenation</a>. For example,
+</p>
+<div class="smallexample">
+<pre class="smallexample">#define foo() bar
+foo()baz
+ &rarr; bar baz
+<em>not</em>
+ &rarr; barbaz
+</pre></div>
+
+<p>The compiler does not re-tokenize the preprocessor&rsquo;s output. Each
+preprocessing token becomes one compiler token.
+</p>
+<a name="index-identifiers"></a>
+<p>Preprocessing tokens fall into five broad classes: identifiers,
+preprocessing numbers, string literals, punctuators, and other. An
+<em>identifier</em> is the same as an identifier in C: any sequence of
+letters, digits, or underscores, which begins with a letter or
+underscore. Keywords of C have no significance to the preprocessor;
+they are ordinary identifiers. You can define a macro whose name is a
+keyword, for instance. The only identifier which can be considered a
+preprocessing keyword is <code>defined</code>. See <a href="Defined.html#Defined">Defined</a>.
+</p>
+<p>This is mostly true of other languages which use the C preprocessor.
+However, a few of the keywords of C++ are significant even in the
+preprocessor. See <a href="C_002b_002b-Named-Operators.html#C_002b_002b-Named-Operators">C++ Named Operators</a>.
+</p>
+<p>In the 1999 C standard, identifiers may contain letters which are not
+part of the &ldquo;basic source character set&rdquo;, at the implementation&rsquo;s
+discretion (such as accented Latin letters, Greek letters, or Chinese
+ideograms). This may be done with an extended character set, or the
+&lsquo;<samp>\u</samp>&rsquo; and &lsquo;<samp>\U</samp>&rsquo; escape sequences.
+</p>
+<p>As an extension, GCC treats &lsquo;<samp>$</samp>&rsquo; as a letter. This is for
+compatibility with some systems, such as VMS, where &lsquo;<samp>$</samp>&rsquo; is commonly
+used in system-defined function and object names. &lsquo;<samp>$</samp>&rsquo; is not a
+letter in strictly conforming mode, or if you specify the <samp>-$</samp>
+option. See <a href="Invocation.html#Invocation">Invocation</a>.
+</p>
+<a name="index-numbers"></a>
+<a name="index-preprocessing-numbers"></a>
+<p>A <em>preprocessing number</em> has a rather bizarre definition. The
+category includes all the normal integer and floating point constants
+one expects of C, but also a number of other things one might not
+initially recognize as a number. Formally, preprocessing numbers begin
+with an optional period, a required decimal digit, and then continue
+with any sequence of letters, digits, underscores, periods, and
+exponents. Exponents are the two-character sequences &lsquo;<samp>e+</samp>&rsquo;,
+&lsquo;<samp>e-</samp>&rsquo;, &lsquo;<samp>E+</samp>&rsquo;, &lsquo;<samp>E-</samp>&rsquo;, &lsquo;<samp>p+</samp>&rsquo;, &lsquo;<samp>p-</samp>&rsquo;, &lsquo;<samp>P+</samp>&rsquo;, and
+&lsquo;<samp>P-</samp>&rsquo;. (The exponents that begin with &lsquo;<samp>p</samp>&rsquo; or &lsquo;<samp>P</samp>&rsquo; are
+used for hexadecimal floating-point constants.)
+</p>
+<p>The purpose of this unusual definition is to isolate the preprocessor
+from the full complexity of numeric constants. It does not have to
+distinguish between lexically valid and invalid floating-point numbers,
+which is complicated. The definition also permits you to split an
+identifier at any position and get exactly two tokens, which can then be
+pasted back together with the &lsquo;<samp>##</samp>&rsquo; operator.
+</p>
+<p>It&rsquo;s possible for preprocessing numbers to cause programs to be
+misinterpreted. For example, <code>0xE+12</code> is a preprocessing number
+which does not translate to any valid numeric constant, therefore a
+syntax error. It does not mean <code>0xE&nbsp;+&nbsp;12<!-- /@w --></code>, which is what you
+might have intended.
+</p>
+<a name="index-string-literals"></a>
+<a name="index-string-constants"></a>
+<a name="index-character-constants"></a>
+<a name="index-header-file-names"></a>
+<p><em>String literals</em> are string constants, character constants, and
+header file names (the argument of &lsquo;<samp>#include</samp>&rsquo;).<a name="DOCF2" href="#FOOT2"><sup>2</sup></a> String constants and character
+constants are straightforward: <tt>&quot;&hellip;&quot;</tt> or <tt>'&hellip;'</tt>. In
+either case embedded quotes should be escaped with a backslash:
+<tt>'\''</tt> is the character constant for &lsquo;<samp>'</samp>&rsquo;. There is no limit on
+the length of a character constant, but the value of a character
+constant that contains more than one character is
+implementation-defined. See <a href="Implementation-Details.html#Implementation-Details">Implementation Details</a>.
+</p>
+<p>Header file names either look like string constants, <tt>&quot;&hellip;&quot;</tt>, or are
+written with angle brackets instead, <tt>&lt;&hellip;&gt;</tt>. In either case,
+backslash is an ordinary character. There is no way to escape the
+closing quote or angle bracket. The preprocessor looks for the header
+file in different places depending on which form you use. See <a href="Include-Operation.html#Include-Operation">Include Operation</a>.
+</p>
+<p>No string literal may extend past the end of a line. You may use continued
+lines instead, or string constant concatenation.
+</p>
+<a name="index-punctuators"></a>
+<a name="index-digraphs"></a>
+<a name="index-alternative-tokens"></a>
+<p><em>Punctuators</em> are all the usual bits of punctuation which are
+meaningful to C and C++. All but three of the punctuation characters in
+ASCII are C punctuators. The exceptions are &lsquo;<samp>@</samp>&rsquo;, &lsquo;<samp>$</samp>&rsquo;, and
+&lsquo;<samp>`</samp>&rsquo;. In addition, all the two- and three-character operators are
+punctuators. There are also six <em>digraphs</em>, which the C++ standard
+calls <em>alternative tokens</em>, which are merely alternate ways to spell
+other punctuators. This is a second attempt to work around missing
+punctuation in obsolete systems. It has no negative side effects,
+unlike trigraphs, but does not cover as much ground. The digraphs and
+their corresponding normal punctuators are:
+</p>
+<div class="smallexample">
+<pre class="smallexample">Digraph: &lt;% %&gt; &lt;: :&gt; %: %:%:
+Punctuator: { } [ ] # ##
+</pre></div>
+
+<a name="index-other-tokens"></a>
+<p>Any other single byte is considered &ldquo;other&rdquo; and passed on to the
+preprocessor&rsquo;s output unchanged. The C compiler will almost certainly
+reject source code containing &ldquo;other&rdquo; tokens. In ASCII, the only
+&ldquo;other&rdquo; characters are &lsquo;<samp>@</samp>&rsquo;, &lsquo;<samp>$</samp>&rsquo;, &lsquo;<samp>`</samp>&rsquo;, and control
+characters other than NUL (all bits zero). (Note that &lsquo;<samp>$</samp>&rsquo; is
+normally considered a letter.) All bytes with the high bit set
+(numeric range 0x7F&ndash;0xFF) that were not succesfully interpreted as
+part of an extended character in the input encoding are also &ldquo;other&rdquo;
+in the present implementation.
+</p>
+<p>NUL is a special case because of the high probability that its
+appearance is accidental, and because it may be invisible to the user
+(many terminals do not display NUL at all). Within comments, NULs are
+silently ignored, just as any other character would be. In running
+text, NUL is considered white space. For example, these two directives
+have the same meaning.
+</p>
+<div class="smallexample">
+<pre class="smallexample">#define X^@1
+#define X 1
+</pre></div>
+
+<p>(where &lsquo;<samp>^@</samp>&rsquo; is ASCII NUL). Within string or character constants,
+NULs are preserved. In the latter two cases the preprocessor emits a
+warning message.
+</p>
+<div class="footnote">
+<hr>
+<h4 class="footnotes-heading">Footnotes</h4>
+
+<h3><a name="FOOT2" href="#DOCF2">(2)</a></h3>
+<p>The C
+standard uses the term <em>string literal</em> to refer only to what we are
+calling <em>string constants</em>.</p>
+</div>
+<hr>
+<div class="header">
+<p>
+Next: <a href="The-preprocessing-language.html#The-preprocessing-language" accesskey="n" rel="next">The preprocessing language</a>, Previous: <a href="Initial-processing.html#Initial-processing" accesskey="p" rel="previous">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p>
+</div>
+
+
+
+</body>
+</html>