你就把u16string拿来用有啥不行的吗?
java的string就是u16啊
【 在 seablue 的大作中提到: 】
: python runtime中的编码应该是由locale/code page决定的。文件中的编码由文件的开头声明决定,默认是utf-8. Python will enable UTF-8 mode(运行时环境) by default from Python 3.15. 这个影响最大的是windows平台,因为mac和linux上,locale通常默认是utf-8.
: peps.python.org/pep-3120/
: peps.python.org/pep-0686/
: 在内部,python解释器应该是把各种编码转成某种unicode定长编码(比如ascii、utf-16、utf-32)来分别处理:
: Python 3.3 and later (PEP 393):
: peps.python.org/pep-0393/
: The internal representation can be one of three forms:
: - PyASCIIObject:
: Used for strings containing only ASCII characters (code points up to 127). These are stored using 1 byte per character.
: - PyCompactUnicodeObject:
: Used for strings containing characters from the Basic Multilingual Plane (BMP), but not beyond. These are stored using 2 bytes per character (similar to UTF-16).
: - PyUnicodeObject:
: Used for strings containing characters outside the BMP (requiring code points up to 1,114,111). These are stored using 4 bytes per character (similar to UTF-32).
: This flexible representation means that Python dynamically adjusts the internal storage width to minimize memory consumption, unlike older versions (Python 2 and early Python 3) that used a fixed-width UCS-2 or UCS-4 representation determined at compile t
: ime.
: For example, a string containing only "hello" would be stored using 1-byte characters, while a string containing "hello?" would use a 4-byte representation because of the emoji character. This allows for efficient memory usage while maintaining the abilit
: y to represent the full range of Unicode characters.
: ==========
: In Python 3.3 which implements PEP 393. The new representation will pick one or several of ascii, latin-1, utf-8, utf-16, utf-32, generally trying to get a compact representation.
: ==========
: python对字符串的这种处理方式——程序员可以只使用utf-8编码,而内部使用将字符串拆为成ascii、utf-16、utf-32定长编码——既对程序员屏蔽了细节,又使用定长编码来节约存诸空间、加快下标取值速度和方便计算字符串长度,值得C++借鉴。
: C++的步子太小了,还把宽窄字符串这种内存细节暴露给程序员,这不是一种现代化的处理思路。
--
修改:kirbyzhou FROM 114.247.175.*
FROM 114.247.175.*