python runtime中的编码应该是由locale/code page决定的。文件中的编码由文件的开头声明决定,默认是utf-8. Python will enable UTF-8 mode(运行时环境) by default from Python 3.15. 这个影响最大的是windows平台,因为mac和linux上,locale通常默认是utf-8.
peps.python.org/pep-3120/
peps.python.org/pep-0686/
在内部,python解释器应该是把各种编码转成某种unicode定长编码(比如ascii、utf-16、utf-32)来分别处理:
Python 3.3 and later (PEP 393):
peps.python.org/pep-0393/
The internal representation can be one of three forms:
- PyASCIIObject:
Used for strings containing only ASCII characters (code points up to 127). These are stored using 1 byte per character.
- PyCompactUnicodeObject:
Used for strings containing characters from the Basic Multilingual Plane (BMP), but not beyond. These are stored using 2 bytes per character (similar to UTF-16).
- PyUnicodeObject:
Used for strings containing characters outside the BMP (requiring code points up to 1,114,111). These are stored using 4 bytes per character (similar to UTF-32).
This flexible representation means that Python dynamically adjusts the internal storage width to minimize memory consumption, unlike older versions (Python 2 and early Python 3) that used a fixed-width UCS-2 or UCS-4 representation determined at compile time.
For example, a string containing only "hello" would be stored using 1-byte characters, while a string containing "hello?" would use a 4-byte representation because of the emoji character. This allows for efficient memory usage while maintaining the ability to represent the full range of Unicode characters.
==========
In Python 3.3 which implements PEP 393. The new representation will pick one or several of ascii, latin-1, utf-8, utf-16, utf-32, generally trying to get a compact representation.
==========
python对字符串的这种处理方式——程序员可以只使用utf-8编码,而内部将字符串拆成ascii、utf-16、utf-32定长编码——既对程序员屏蔽了细节,又使用定长编码来节约存诸空间、加快下标取值速度和方便计算字符串长度,值得C++借鉴。
C++的步子太小了,还把宽窄字符串这种内存细节暴露给程序员,这不是一种现代化的处理思路。
【 在 easior 的大作中提到: 】
: Python 的字符串底层应该是建立在 C 的多字节流之上的吧
: C 的这一套处理方案依赖于本地策略集,也容易出现乱码
: Python 3 预设了所有编码都是 UTF-8,不知道虚拟机如何处理运行环境的?
: ...................
--
修改:seablue FROM 111.200.40.*
FROM 111.200.40.*