Skip to content

Embedded null characters can lead to bugs or even security vulnerabilities#111656

@vstinner

Description

@vstinner

I just modified PyUnicode_AsUTF8() of the C API to raise an exception if a string contains an embedded null character to reduce the risk of security vulnerabilities. PyUnicode_AsUTF8() caller expects a string terminated by a null byte. If the UTF-8 encoded string contains embedded null byte, the caller is likely to truncate the string without knowing that there are more bytes after "the first" null byte.

See: https://owasp.org/www-community/attacks/Embedding_Null_Code

It's not only about security issue, it can also just be seen as a bug: unwanted behavior.

Previous issues:

Discussions:


Example with Python 3.12:

importctypeslibc=ctypes.cdll.LoadLibrary('libc.so.6') printf=libc.printfPyUnicode_AsUTF8=ctypes.pythonapi.PyUnicode_AsUTF8PyUnicode_AsUTF8.argtypes= (ctypes.py_object,) PyUnicode_AsUTF8.restype=ctypes.c_char_pmy_string="World\0truncated string"printf(b"Hello %s\n", PyUnicode_AsUTF8(my_string))

Output:

Hello World 

The truncated string part is silently ignored!


Multiple functions were modified in the past to prevent this problem. Examples:

  • _dbm.open(): check filename
  • _gdbm.open(): check filename
  • PyBytes_AsStringAndSize(str, NULL)
  • grp.getgrnam(): check name
  • pwd.getpwnam(): check name
  • _locale.strxfrm(): check argument
  • path_converter() of the os module: basically any filename and path
  • PyUnicode_AsWideCharString()
  • os.putenv()
  • _posixsubprocess.fork_exec(): executable_list
  • _struct.Struct: check format
  • _tkinter SetVar() and varname_converter()
  • _winapi.CreateProcess() getenvironment()
  • PyUnicode_EncodeLocale()
  • PyUnicode_EncodeFSDefault()
  • unicode_decode_locale()
  • PyUnicode_FSConverter()
  • PyUnicode_DecodeLocale()
  • PyUnicode_DecodeLocaleAndSize()
  • PyUnicode_FSDecoder()
  • PyUnicode_AsUTF8() -- recently modified
  • _Py_stat(): check path
  • getargs.c: 's', 'y' and 'z' formats

There are exceptions which accept embedded null bytes/characters:

  • socket: AF_UNIX socket name

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions