diff options
Diffstat (limited to 'share/doc/cpp/Character-sets.html')
-rw-r--r-- | share/doc/cpp/Character-sets.html | 140 |
1 files changed, 140 insertions, 0 deletions
diff --git a/share/doc/cpp/Character-sets.html b/share/doc/cpp/Character-sets.html new file mode 100644 index 0000000..495469c --- /dev/null +++ b/share/doc/cpp/Character-sets.html @@ -0,0 +1,140 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> +<html> +<!-- Copyright (C) 1987-2023 Free Software Foundation, Inc. + +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation. A copy of +the license is included in the +section entitled "GNU Free Documentation License". + +This manual contains no Invariant Sections. The Front-Cover Texts are +(a) (see below), and the Back-Cover Texts are (b) (see below). + +(a) The FSF's Front-Cover Text is: + +A GNU Manual + +(b) The FSF's Back-Cover Text is: + +You have freedom to copy and modify this GNU Manual, like GNU + software. Copies published by the Free Software Foundation raise + funds for GNU development. --> +<!-- Created by GNU Texinfo 5.1, http://www.gnu.org/software/texinfo/ --> +<head> +<title>The C Preprocessor: Character sets</title> + +<meta name="description" content="The C Preprocessor: Character sets"> +<meta name="keywords" content="The C Preprocessor: Character sets"> +<meta name="resource-type" content="document"> +<meta name="distribution" content="global"> +<meta name="Generator" content="makeinfo"> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> +<link href="index.html#Top" rel="start" title="Top"> +<link href="Index-of-Directives.html#Index-of-Directives" rel="index" title="Index of Directives"> +<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents"> +<link href="Overview.html#Overview" rel="up" title="Overview"> +<link href="Initial-processing.html#Initial-processing" rel="next" title="Initial processing"> +<link href="Overview.html#Overview" rel="previous" title="Overview"> +<style type="text/css"> +<!-- +a.summary-letter {text-decoration: none} +blockquote.smallquotation {font-size: smaller} +div.display {margin-left: 3.2em} +div.example {margin-left: 3.2em} +div.indentedblock {margin-left: 3.2em} +div.lisp {margin-left: 3.2em} +div.smalldisplay {margin-left: 3.2em} +div.smallexample {margin-left: 3.2em} +div.smallindentedblock {margin-left: 3.2em; font-size: smaller} +div.smalllisp {margin-left: 3.2em} +kbd {font-style:oblique} +pre.display {font-family: inherit} +pre.format {font-family: inherit} +pre.menu-comment {font-family: serif} +pre.menu-preformatted {font-family: serif} +pre.smalldisplay {font-family: inherit; font-size: smaller} +pre.smallexample {font-size: smaller} +pre.smallformat {font-family: inherit; font-size: smaller} +pre.smalllisp {font-size: smaller} +span.nocodebreak {white-space:nowrap} +span.nolinebreak {white-space:nowrap} +span.roman {font-family:serif; font-weight:normal} +span.sansserif {font-family:sans-serif; font-weight:normal} +ul.no-bullet {list-style: none} +--> +</style> + + +</head> + +<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000"> +<a name="Character-sets"></a> +<div class="header"> +<p> +Next: <a href="Initial-processing.html#Initial-processing" accesskey="n" rel="next">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p> +</div> +<hr> +<a name="Character-sets-1"></a> +<h3 class="section">1.1 Character sets</h3> + +<p>Source code character set processing in C and related languages is +rather complicated. The C standard discusses two character sets, but +there are really at least four. +</p> +<p>The files input to CPP might be in any character set at all. CPP’s +very first action, before it even looks for line boundaries, is to +convert the file into the character set it uses for internal +processing. That set is what the C standard calls the <em>source</em> +character set. It must be isomorphic with ISO 10646, also known as +Unicode. CPP uses the UTF-8 encoding of Unicode. +</p> +<p>The character sets of the input files are specified using the +<samp>-finput-charset=</samp> option. +</p> +<p>All preprocessing work (the subject of the rest of this manual) is +carried out in the source character set. If you request textual +output from the preprocessor with the <samp>-E</samp> option, it will be +in UTF-8. +</p> +<p>After preprocessing is complete, string and character constants are +converted again, into the <em>execution</em> character set. This +character set is under control of the user; the default is UTF-8, +matching the source character set. Wide string and character +constants have their own character set, which is not called out +specifically in the standard. Again, it is under control of the user. +The default is UTF-16 or UTF-32, whichever fits in the target’s +<code>wchar_t</code> type, in the target machine’s byte +order.<a name="DOCF1" href="#FOOT1"><sup>1</sup></a> Octal and hexadecimal escape sequences do not undergo +conversion; <tt>'\x12'</tt> has the value 0x12 regardless of the currently +selected execution character set. All other escapes are replaced by +the character in the source character set that they represent, then +converted to the execution character set, just like unescaped +characters. +</p> +<p>In identifiers, characters outside the ASCII range can be specified +with the ‘<samp>\u</samp>’ and ‘<samp>\U</samp>’ escapes or used directly in the input +encoding. If strict ISO C90 conformance is specified with an option +such as <samp>-std=c90</samp>, or <samp>-fno-extended-identifiers</samp> is +used, then those constructs are not permitted in identifiers. +</p> +<div class="footnote"> +<hr> +<h4 class="footnotes-heading">Footnotes</h4> + +<h3><a name="FOOT1" href="#DOCF1">(1)</a></h3> +<p>UTF-16 does not meet the requirements of the C +standard for a wide character set, but the choice of 16-bit +<code>wchar_t</code> is enshrined in some system ABIs so we cannot fix +this.</p> +</div> +<hr> +<div class="header"> +<p> +Next: <a href="Initial-processing.html#Initial-processing" accesskey="n" rel="next">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p> +</div> + + + +</body> +</html> |