Skip to content

Conversation

@encukou
Copy link
Member

@encukouencukou commented Apr 5, 2024

To test the errors argument, we read a UTF-16 file as UTF-8 with "backslashreplace" error handling. However, the utf-16 codec adds an endian-specific byte-order mark, so on big-endian machines the expectation doesn't match the test file (which was saved on a little-endian machine).

Use endswith to ignore the BOM.

… tests To test the `errors` argument, we read a UTF-16 file as UTF-8 with "backslashreplace" error handling. However, the utf-16 codec adds an endian-specific byte-order mark, so on big-endian machines the expectation doesn't match the test file (which was saved on a little-endian machine). Use endswith to ignore the BOM.
@bedevere-appbedevere-appbot added tests Tests in the Lib/test dir awaiting core review labels Apr 5, 2024
@encukouencukou changed the title gh-116609: Ignore UTF-16 BOM in importlib.resources._functional testsgh-116608: Ignore UTF-16 BOM in importlib.resources._functional testsApr 5, 2024
@encukou
Copy link
MemberAuthor

!buildbot s390x

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @encukou for commit 26ae210 🤖

The command will test the builders whose names match following regular expression: s390x

The builders matched are:

  • s390x Fedora Rawhide Clang Installed PR
  • s390x Fedora Rawhide Clang PR
  • s390x Fedora LTO PR
  • s390x Fedora Refleaks PR
  • s390x RHEL7 LTO + PGO PR
  • s390x Fedora LTO + PGO PR
  • s390x Fedora Clang PR
  • s390x Fedora PR
  • s390x Fedora Rawhide LTO PR
  • s390x Fedora Rawhide PR
  • s390x RHEL8 LTO PR
  • s390x Fedora Rawhide Refleaks PR
  • s390x Fedora Clang Installed PR
  • s390x RHEL8 PR
  • s390x Fedora Rawhide LTO + PGO PR
  • s390x RHEL8 Refleaks PR
  • s390x RHEL8 LTO + PGO PR
  • s390x RHEL7 PR
  • s390x RHEL7 LTO PR
  • s390x RHEL7 Refleaks PR
  • s390x SLES PR
  • s390x Debian PR

@encukouencukou merged commit 4d4a6f1 into python:mainApr 5, 2024
@encukouencukou deleted the importlib-tests-be branch April 5, 2024 15:00
@zooba
Copy link
Member

zooba commented Apr 8, 2024

@encukou Out of interest, was the endswith necessary? I thought using utf-16-le would strip the BOM automatically, and the issue you were hitting is that utf-16-be (implied by utf-16 on BE machines) was rejecting it. Explicitly specifying -le should have worked, I'd thought.

diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024
… tests (pythonGH-117569) pythongh-116609: Ignore UTF-16 BOM in importlib.resources._functional tests To test the `errors` argument, we read a UTF-16 file as UTF-8 with "backslashreplace" error handling. However, the utf-16 codec adds an endian-specific byte-order mark, so on big-endian machines the expectation doesn't match the test file (which was saved on a little-endian machine). Use endswith to ignore the BOM.
@jaraco
Copy link
Member

This change needs to be applied to importlib_resources. It looks like a related issue was reported in python/importlib_resources#312.

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip newstestsTests in the Lib/test dir

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

@encukou@bedevere-bot@zooba@jaraco