1 files changed, 140 insertions, 0 deletions
diff --git a/share/doc/cpp/Character-sets.html b/share/doc/cpp/Character-sets.html
new file mode 100644
index 0000000..495469c
--- /dev/null
+++ b/share/doc/cpp/Character-sets.html
@@ -0,0 +1,140 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<!-- Copyright (C) 1987-2023 Free Software Foundation, Inc.
+
+Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.3 or
+any later version published by the Free Software Foundation.  A copy of
+the license is included in the
+section entitled "GNU Free Documentation License".
+
+This manual contains no Invariant Sections.  The Front-Cover Texts are
+(a) (see below), and the Back-Cover Texts are (b) (see below).
+
+(a) The FSF's Front-Cover Text is:
+
+A GNU Manual
+
+(b) The FSF's Back-Cover Text is:
+
+You have freedom to copy and modify this GNU Manual, like GNU
+     software.  Copies published by the Free Software Foundation raise
+     funds for GNU development. -->
+<!-- Created by GNU Texinfo 5.1, http://www.gnu.org/software/texinfo/ -->
+<head>
+<title>The C Preprocessor: Character sets</title>
+
+<meta name="description" content="The C Preprocessor: Character sets">
+<meta name="keywords" content="The C Preprocessor: Character sets">
+<meta name="resource-type" content="document">
+<meta name="distribution" content="global">
+<meta name="Generator" content="makeinfo">
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+<link href="index.html#Top" rel="start" title="Top">
+<link href="Index-of-Directives.html#Index-of-Directives" rel="index" title="Index of Directives">
+<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
+<link href="Overview.html#Overview" rel="up" title="Overview">
+<link href="Initial-processing.html#Initial-processing" rel="next" title="Initial processing">
+<link href="Overview.html#Overview" rel="previous" title="Overview">
+<style type="text/css">
+<!--
+a.summary-letter {text-decoration: none}
+blockquote.smallquotation {font-size: smaller}
+div.display {margin-left: 3.2em}
+div.example {margin-left: 3.2em}
+div.indentedblock {margin-left: 3.2em}
+div.lisp {margin-left: 3.2em}
+div.smalldisplay {margin-left: 3.2em}
+div.smallexample {margin-left: 3.2em}
+div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
+div.smalllisp {margin-left: 3.2em}
+kbd {font-style:oblique}
+pre.display {font-family: inherit}
+pre.format {font-family: inherit}
+pre.menu-comment {font-family: serif}
+pre.menu-preformatted {font-family: serif}
+pre.smalldisplay {font-family: inherit; font-size: smaller}
+pre.smallexample {font-size: smaller}
+pre.smallformat {font-family: inherit; font-size: smaller}
+pre.smalllisp {font-size: smaller}
+span.nocodebreak {white-space:nowrap}
+span.nolinebreak {white-space:nowrap}
+span.roman {font-family:serif; font-weight:normal}
+span.sansserif {font-family:sans-serif; font-weight:normal}
+ul.no-bullet {list-style: none}
+-->
+</style>
+
+
+</head>
+
+<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
+<a name="Character-sets"></a>
+<div class="header">
+<p>
+Next: <a href="Initial-processing.html#Initial-processing" accesskey="n" rel="next">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p>
+</div>
+<hr>
+<a name="Character-sets-1"></a>
+<h3 class="section">1.1 Character sets</h3>
+
+<p>Source code character set processing in C and related languages is
+rather complicated.  The C standard discusses two character sets, but
+there are really at least four.
+</p>
+<p>The files input to CPP might be in any character set at all.  CPP&rsquo;s
+very first action, before it even looks for line boundaries, is to
+convert the file into the character set it uses for internal
+processing.  That set is what the C standard calls the <em>source</em>
+character set.  It must be isomorphic with ISO 10646, also known as
+Unicode.  CPP uses the UTF-8 encoding of Unicode.
+</p>
+<p>The character sets of the input files are specified using the
+<samp>-finput-charset=</samp> option.
+</p>
+<p>All preprocessing work (the subject of the rest of this manual) is
+carried out in the source character set.  If you request textual
+output from the preprocessor with the <samp>-E</samp> option, it will be
+in UTF-8.
+</p>
+<p>After preprocessing is complete, string and character constants are
+converted again, into the <em>execution</em> character set.  This
+character set is under control of the user; the default is UTF-8,
+matching the source character set.  Wide string and character
+constants have their own character set, which is not called out
+specifically in the standard.  Again, it is under control of the user.
+The default is UTF-16 or UTF-32, whichever fits in the target&rsquo;s
+<code>wchar_t</code> type, in the target machine&rsquo;s byte
+order.<a name="DOCF1" href="#FOOT1"><sup>1</sup></a>  Octal and hexadecimal escape sequences do not undergo
+conversion; <tt>'\x12'</tt> has the value 0x12 regardless of the currently
+selected execution character set.  All other escapes are replaced by
+the character in the source character set that they represent, then
+converted to the execution character set, just like unescaped
+characters.
+</p>
+<p>In identifiers, characters outside the ASCII range can be specified
+with the &lsquo;<samp>\u</samp>&rsquo; and &lsquo;<samp>\U</samp>&rsquo; escapes or used directly in the input
+encoding.  If strict ISO C90 conformance is specified with an option
+such as <samp>-std=c90</samp>, or <samp>-fno-extended-identifiers</samp> is
+used, then those constructs are not permitted in identifiers.
+</p>
+<div class="footnote">
+<hr>
+<h4 class="footnotes-heading">Footnotes</h4>
+
+<h3><a name="FOOT1" href="#DOCF1">(1)</a></h3>
+<p>UTF-16 does not meet the requirements of the C
+standard for a wide character set, but the choice of 16-bit
+<code>wchar_t</code> is enshrined in some system ABIs so we cannot fix
+this.</p>
+</div>
+<hr>
+<div class="header">
+<p>
+Next: <a href="Initial-processing.html#Initial-processing" accesskey="n" rel="next">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p>
+</div>
+
+
+
+</body>
+</html>