summaryrefslogtreecommitdiff
path: root/share/doc/cppinternals/Token-Spacing.html
diff options
context:
space:
mode:
Diffstat (limited to 'share/doc/cppinternals/Token-Spacing.html')
-rw-r--r--share/doc/cppinternals/Token-Spacing.html203
1 files changed, 203 insertions, 0 deletions
diff --git a/share/doc/cppinternals/Token-Spacing.html b/share/doc/cppinternals/Token-Spacing.html
new file mode 100644
index 0000000..8da69cb
--- /dev/null
+++ b/share/doc/cppinternals/Token-Spacing.html
@@ -0,0 +1,203 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<!-- Created by GNU Texinfo 5.1, http://www.gnu.org/software/texinfo/ -->
+<head>
+<title>The GNU C Preprocessor Internals: Token Spacing</title>
+
+<meta name="description" content="The GNU C Preprocessor Internals: Token Spacing">
+<meta name="keywords" content="The GNU C Preprocessor Internals: Token Spacing">
+<meta name="resource-type" content="document">
+<meta name="distribution" content="global">
+<meta name="Generator" content="makeinfo">
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+<link href="index.html#Top" rel="start" title="Top">
+<link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index">
+<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
+<link href="index.html#Top" rel="up" title="Top">
+<link href="Line-Numbering.html#Line-Numbering" rel="next" title="Line Numbering">
+<link href="Macro-Expansion.html#Macro-Expansion" rel="previous" title="Macro Expansion">
+<style type="text/css">
+<!--
+a.summary-letter {text-decoration: none}
+blockquote.smallquotation {font-size: smaller}
+div.display {margin-left: 3.2em}
+div.example {margin-left: 3.2em}
+div.indentedblock {margin-left: 3.2em}
+div.lisp {margin-left: 3.2em}
+div.smalldisplay {margin-left: 3.2em}
+div.smallexample {margin-left: 3.2em}
+div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
+div.smalllisp {margin-left: 3.2em}
+kbd {font-style:oblique}
+pre.display {font-family: inherit}
+pre.format {font-family: inherit}
+pre.menu-comment {font-family: serif}
+pre.menu-preformatted {font-family: serif}
+pre.smalldisplay {font-family: inherit; font-size: smaller}
+pre.smallexample {font-size: smaller}
+pre.smallformat {font-family: inherit; font-size: smaller}
+pre.smalllisp {font-size: smaller}
+span.nocodebreak {white-space:nowrap}
+span.nolinebreak {white-space:nowrap}
+span.roman {font-family:serif; font-weight:normal}
+span.sansserif {font-family:sans-serif; font-weight:normal}
+ul.no-bullet {list-style: none}
+-->
+</style>
+
+
+</head>
+
+<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
+<a name="Token-Spacing"></a>
+<div class="header">
+<p>
+Next: <a href="Line-Numbering.html#Line-Numbering" accesskey="n" rel="next">Line Numbering</a>, Previous: <a href="Macro-Expansion.html#Macro-Expansion" accesskey="p" rel="previous">Macro Expansion</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
+</div>
+<hr>
+<a name="Token-Spacing-1"></a>
+<h2 class="unnumbered">Token Spacing</h2>
+<a name="index-paste-avoidance"></a>
+<a name="index-spacing"></a>
+<a name="index-token-spacing"></a>
+
+<p>First, consider an issue that only concerns the stand-alone
+preprocessor: there needs to be a guarantee that re-reading its preprocessed
+output results in an identical token stream. Without taking special
+measures, this might not be the case because of macro substitution.
+For example:
+</p>
+<div class="smallexample">
+<pre class="smallexample">#define PLUS +
+#define EMPTY
+#define f(x) =x=
++PLUS -EMPTY- PLUS+ f(=)
+ &rarr; + + - - + + = = =
+<em>not</em>
+ &rarr; ++ -- ++ ===
+</pre></div>
+
+<p>One solution would be to simply insert a space between all adjacent
+tokens. However, we would like to keep space insertion to a minimum,
+both for aesthetic reasons and because it causes problems for people who
+still try to abuse the preprocessor for things like Fortran source and
+Makefiles.
+</p>
+<p>For now, just notice that when tokens are added (or removed, as shown by
+the <code>EMPTY</code> example) from the original lexed token stream, we need
+to check for accidental token pasting. We call this <em>paste
+avoidance</em>. Token addition and removal can only occur because of macro
+expansion, but accidental pasting can occur in many places: both before
+and after each macro replacement, each argument replacement, and
+additionally each token created by the &lsquo;<samp>#</samp>&rsquo; and &lsquo;<samp>##</samp>&rsquo; operators.
+</p>
+<p>Look at how the preprocessor gets whitespace output correct
+normally. The <code>cpp_token</code> structure contains a flags byte, and one
+of those flags is <code>PREV_WHITE</code>. This is flagged by the lexer, and
+indicates that the token was preceded by whitespace of some form other
+than a new line. The stand-alone preprocessor can use this flag to
+decide whether to insert a space between tokens in the output.
+</p>
+<p>Now consider the result of the following macro expansion:
+</p>
+<div class="smallexample">
+<pre class="smallexample">#define add(x, y, z) x + y +z;
+sum = add (1,2, 3);
+ &rarr; sum = 1 + 2 +3;
+</pre></div>
+
+<p>The interesting thing here is that the tokens &lsquo;<samp>1</samp>&rsquo; and &lsquo;<samp>2</samp>&rsquo; are
+output with a preceding space, and &lsquo;<samp>3</samp>&rsquo; is output without a
+preceding space, but when lexed none of these tokens had that property.
+Careful consideration reveals that &lsquo;<samp>1</samp>&rsquo; gets its preceding
+whitespace from the space preceding &lsquo;<samp>add</samp>&rsquo; in the macro invocation,
+<em>not</em> replacement list. &lsquo;<samp>2</samp>&rsquo; gets its whitespace from the
+space preceding the parameter &lsquo;<samp>y</samp>&rsquo; in the macro replacement list,
+and &lsquo;<samp>3</samp>&rsquo; has no preceding space because parameter &lsquo;<samp>z</samp>&rsquo; has none
+in the replacement list.
+</p>
+<p>Once lexed, tokens are effectively fixed and cannot be altered, since
+pointers to them might be held in many places, in particular by
+in-progress macro expansions. So instead of modifying the two tokens
+above, the preprocessor inserts a special token, which I call a
+<em>padding token</em>, into the token stream to indicate that spacing of
+the subsequent token is special. The preprocessor inserts padding
+tokens in front of every macro expansion and expanded macro argument.
+These point to a <em>source token</em> from which the subsequent real token
+should inherit its spacing. In the above example, the source tokens are
+&lsquo;<samp>add</samp>&rsquo; in the macro invocation, and &lsquo;<samp>y</samp>&rsquo; and &lsquo;<samp>z</samp>&rsquo; in the
+macro replacement list, respectively.
+</p>
+<p>It is quite easy to get multiple padding tokens in a row, for example if
+a macro&rsquo;s first replacement token expands straight into another macro.
+</p>
+<div class="smallexample">
+<pre class="smallexample">#define foo bar
+#define bar baz
+[foo]
+ &rarr; [baz]
+</pre></div>
+
+<p>Here, two padding tokens are generated with sources the &lsquo;<samp>foo</samp>&rsquo; token
+between the brackets, and the &lsquo;<samp>bar</samp>&rsquo; token from foo&rsquo;s replacement
+list, respectively. Clearly the first padding token is the one to
+use, so the output code should contain a rule that the first
+padding token in a sequence is the one that matters.
+</p>
+<p>But what if a macro expansion is left? Adjusting the above
+example slightly:
+</p>
+<div class="smallexample">
+<pre class="smallexample">#define foo bar
+#define bar EMPTY baz
+#define EMPTY
+[foo] EMPTY;
+ &rarr; [ baz] ;
+</pre></div>
+
+<p>As shown, now there should be a space before &lsquo;<samp>baz</samp>&rsquo; and the
+semicolon in the output.
+</p>
+<p>The rules we decided above fail for &lsquo;<samp>baz</samp>&rsquo;: we generate three
+padding tokens, one per macro invocation, before the token &lsquo;<samp>baz</samp>&rsquo;.
+We would then have it take its spacing from the first of these, which
+carries source token &lsquo;<samp>foo</samp>&rsquo; with no leading space.
+</p>
+<p>It is vital that cpplib get spacing correct in these examples since any
+of these macro expansions could be stringized, where spacing matters.
+</p>
+<p>So, this demonstrates that not just entering macro and argument
+expansions, but leaving them requires special handling too. I made
+cpplib insert a padding token with a <code>NULL</code> source token when
+leaving macro expansions, as well as after each replaced argument in a
+macro&rsquo;s replacement list. It also inserts appropriate padding tokens on
+either side of tokens created by the &lsquo;<samp>#</samp>&rsquo; and &lsquo;<samp>##</samp>&rsquo; operators.
+I expanded the rule so that, if we see a padding token with a
+<code>NULL</code> source token, <em>and</em> that source token has no leading
+space, then we behave as if we have seen no padding tokens at all. A
+quick check shows this rule will then get the above example correct as
+well.
+</p>
+<p>Now a relationship with paste avoidance is apparent: we have to be
+careful about paste avoidance in exactly the same locations we have
+padding tokens in order to get white space correct. This makes
+implementation of paste avoidance easy: wherever the stand-alone
+preprocessor is fixing up spacing because of padding tokens, and it
+turns out that no space is needed, it has to take the extra step to
+check that a space is not needed after all to avoid an accidental paste.
+The function <code>cpp_avoid_paste</code> advises whether a space is required
+between two consecutive tokens. To avoid excessive spacing, it tries
+hard to only require a space if one is likely to be necessary, but for
+reasons of efficiency it is slightly conservative and might recommend a
+space where one is not strictly needed.
+</p>
+<hr>
+<div class="header">
+<p>
+Next: <a href="Line-Numbering.html#Line-Numbering" accesskey="n" rel="next">Line Numbering</a>, Previous: <a href="Macro-Expansion.html#Macro-Expansion" accesskey="p" rel="previous">Macro Expansion</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
+</div>
+
+
+
+</body>
+</html>