Skip to content

euc_kr char '0x3164' decode('ksx1001') cause UnicodeDecodeError #101863

@TakWolf

Description

@TakWolf

Bug report

char '0x3164' can be encode('ksx1001'), but can not decode('ksx1001')

defmain(): code_point=0x3164c=chr(code_point) raw=c.encode('ksx1001') c2=raw.decode('ksx1001') # <--- this cause error print(f'{c}{c2}') if__name__=='__main__': main()
Traceback (most recent call last): File "/Users/takwolf/Develop/FontDev/fusion-pixel-font/build.py", line 11, in <module> main() File "/Users/takwolf/Develop/FontDev/fusion-pixel-font/build.py", line 6, in main c2 = raw.decode('ksx1001') ^^^^^^^^^^^^^^^^^^^^^ UnicodeDecodeError: 'euc_kr' codec can't decode bytes in position 0-1: incomplete multibyte sequence 

The char is Hangul Compatibility Jamo -> Hangul Filler

https://unicode-table.com/en/3164/

image

The following code is get the zone in ks-x-1001:

defmain(): code_point=0x3164c=chr(code_point) raw=c.encode('ksx1001') block_offset=0xA0zone_1=raw[0] -block_offsetzone_2=raw[1] -block_offsetprint(f'{zone_1}{zone_2}') if__name__=='__main__': main()
zone_1 = 4 zone_2 = 52 

https://en.wikipedia.org/wiki/KS_X_1001#Hangul_Filler
image

image

other chars in ksx1001 encode an decode is ok, but only this.

Your environment

  • CPython versions tested on: Python 3.11.1
  • Operating system and architecture: macOS 13.0

Linked PRs

Metadata

Metadata

Labels

interpreter-core(Objects, Python, Grammar, and Parser dirs)topic-unicodetype-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions