Re: 用c++做了一个项目生不如死

水木社区手机版

展开|楼主|同主题展开|溯源|返回

主题:Re: 用c++做了一个项目生不如死
kirbyzhou|2025-07-01 10:02:28|
你就把u16string拿来用有啥不行的吗？
java的string就是u16啊

【在 seablue 的大作中提到: 】
: python runtime中的编码应该是由locale/code page决定的。文件中的编码由文件的开头声明决定，默认是utf-8. Python will enable UTF-8 mode(运行时环境) by default from Python 3.15. 这个影响最大的是windows平台，因为mac和linux上，locale通常默认是utf-8.
: peps.python.org/pep-3120/
: peps.python.org/pep-0686/
: 在内部，python解释器应该是把各种编码转成某种unicode定长编码（比如ascii、utf-16、utf-32）来分别处理：
: Python 3.3 and later (PEP 393):
: peps.python.org/pep-0393/
: The internal representation can be one of three forms:
: - PyASCIIObject:
:     Used for strings containing only ASCII characters (code points up to 127). These are stored using 1 byte per character.
: - PyCompactUnicodeObject:
:     Used for strings containing characters from the Basic Multilingual Plane (BMP), but not beyond. These are stored using 2 bytes per character (similar to UTF-16).
: - PyUnicodeObject:
:     Used for strings containing characters outside the BMP (requiring code points up to 1,114,111). These are stored using 4 bytes per character (similar to UTF-32).
: This flexible representation means that Python dynamically adjusts the internal storage width to minimize memory consumption, unlike older versions (Python 2 and early Python 3) that used a fixed-width UCS-2 or UCS-4 representation determined at compile t
: ime.
: For example, a string containing only "hello" would be stored using 1-byte characters, while a string containing "hello?" would use a 4-byte representation because of the emoji character. This allows for efficient memory usage while maintaining the abilit
: y to represent the full range of Unicode characters.
: ==========
: In Python 3.3 which implements PEP 393. The new representation will pick one or several of ascii, latin-1, utf-8, utf-16, utf-32, generally trying to get a compact representation.
: ==========
: python对字符串的这种处理方式——程序员可以只使用utf-8编码，而内部使用将字符串拆为成ascii、utf-16、utf-32定长编码——既对程序员屏蔽了细节，又使用定长编码来节约存诸空间、加快下标取值速度和方便计算字符串长度，值得C++借鉴。
: C++的步子太小了，还把宽窄字符串这种内存细节暴露给程序员，这不是一种现代化的处理思路。
--
修改:kirbyzhou FROM 114.247.175.*
FROM 114.247.175.*