Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
Description
I just modified PyUnicode_AsUTF8() of the C API to raise an exception if a string contains an embedded null character to reduce the risk of security vulnerabilities. PyUnicode_AsUTF8() caller expects a string terminated by a null byte. If the UTF-8 encoded string contains embedded null byte, the caller is likely to truncate the string without knowing that there are more bytes after "the first" null byte.
See: https://owasp.org/www-community/attacks/Embedding_Null_Code
It's not only about security issue, it can also just be seen as a bug: unwanted behavior.
Previous issues:
- [C API] Change PyUnicode_AsUTF8() to return NULL on embedded null characters #111089
- _winapi.LCMapStringEx fails when encountering a string containing null characters #106844
- os.path.normpath truncates input on null bytes in 3.11, but not 3.10 #106242 -- CVE-2023-41105
- Uncaught exception in
http.serverrequest handling (<=3.10) #103223 - embedded null byte when connecting to sqlite database using a bytes object #84335
- os.path.exists should not throw "Embedded NUL character" exception #73228
- "embedded NUL character" exceptions #66411
- sqlite3 doesn't complain if the request contains a null character #65346
- Reject embedded null characters in wchar* strings #57826
Discussions:
- 2014: https://mail.python.org/archives/list/python-dev@python.org/thread/MZDL7FZZMRSW5MTIHLSA6ANNMCV7EEZN/
Example with Python 3.12:
importctypeslibc=ctypes.cdll.LoadLibrary('libc.so.6') printf=libc.printfPyUnicode_AsUTF8=ctypes.pythonapi.PyUnicode_AsUTF8PyUnicode_AsUTF8.argtypes= (ctypes.py_object,) PyUnicode_AsUTF8.restype=ctypes.c_char_pmy_string="World\0truncated string"printf(b"Hello %s\n", PyUnicode_AsUTF8(my_string))Output:
Hello World The truncated string part is silently ignored!
Multiple functions were modified in the past to prevent this problem. Examples:
- _dbm.open(): check filename
- _gdbm.open(): check filename
PyBytes_AsStringAndSize(str, NULL)- grp.getgrnam(): check name
- pwd.getpwnam(): check name
- _locale.strxfrm(): check argument
- path_converter() of the os module: basically any filename and path
- PyUnicode_AsWideCharString()
- os.putenv()
- _posixsubprocess.fork_exec(): executable_list
- _struct.Struct: check format
- _tkinter SetVar() and varname_converter()
- _winapi.CreateProcess() getenvironment()
- PyUnicode_EncodeLocale()
- PyUnicode_EncodeFSDefault()
- unicode_decode_locale()
- PyUnicode_FSConverter()
- PyUnicode_DecodeLocale()
- PyUnicode_DecodeLocaleAndSize()
- PyUnicode_FSDecoder()
- PyUnicode_AsUTF8() -- recently modified
- _Py_stat(): check path
- getargs.c: 's', 'y' and 'z' formats
There are exceptions which accept embedded null bytes/characters:
- socket: AF_UNIX socket name