Skip to content

Conversation

@sobolevn
Copy link
Member

@sobolevnsobolevn commented Nov 3, 2023

Looks like PyFile_SetOpenCodeHook is already tested here:

staticinttest_open_code_hook(void)
{
intresult=0;
/* Provide a hook */
result=PyFile_SetOpenCodeHook(_open_code_hook, &result);
if (result){
printf("Failed to set hook\n");
return1;
}
/* A second hook should fail */
result=PyFile_SetOpenCodeHook(_open_code_hook, &result);
if (!result){
printf("Should have failed to set second hook\n");
return2;
}
Py_IgnoreEnvironmentFlag=0;
_testembed_Py_InitializeFromConfig();
result=0;
PyObject*r=PyFile_OpenCode("$$test-filename");
if (!r){
PyErr_Print();
result=3;
} else{
void*cmp=PyLong_AsVoidPtr(r);
Py_DECREF(r);
if (cmp!=&result){
printf("Did not get expected result from hook\n");
result=4;
}
}
if (!result){
PyObject*io=PyImport_ImportModule("_io");
PyObject*r=io
? PyObject_CallMethod(io, "open_code", "s", "$$test-filename")
: NULL;
if (!r){
PyErr_Print();
result=5;
} else{
void*cmp=PyLong_AsVoidPtr(r);
Py_DECREF(r);
if (cmp!=&result){
printf("Did not get expected result from hook\n");
result=6;
}
}
Py_XDECREF(io);
}
Py_Finalize();
returnresult;
}

@bedevere-appbedevere-appbot mentioned this pull request Nov 3, 2023
10 tasks
@sobolevn
Copy link
MemberAuthor

sobolevn commented Nov 3, 2023

Tests fail on Windows (I have a very limited experience with this platform):

 ====================================================================== ERROR: test_file_get_line (test.test_capi.test_file.TestPyFileCAPI.test_file_get_line) ---------------------------------------------------------------------- Traceback (most recent call last): File "D:\a\cpython\cpython\Lib\test\test_capi\test_file.py", line 40, in test_file_get_line f.writelines([first_line]) File "D:\a\cpython\cpython\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode characters in position 10-15: character maps to <undefined> 

Is it correct?

Copy link
Member

@serhiy-storchakaserhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This family has little functions, but they should be tested with many cases.

@serhiy-storchaka
Copy link
Member

Tests fail on Windows

Because the default encoding on Windows is not UTF-8. Always specify encoding for text files.

@sobolevn
Copy link
MemberAuthor

@serhiy-storchaka thanks a lot for your detailed review! You are one of the best reviewers I know :)

@sobolevn
Copy link
MemberAuthor

Address sanitizer build fails with:

 ====================================================================== FAIL: test_string_args_as_invalid_utf (test.test_capi.test_file.TestPyFile_FromFd.test_string_args_as_invalid_utf) (arg_pos=5) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/runner/work/cpython/cpython/Lib/test/test_capi/test_file.py", line 77, in test_string_args_as_invalid_utf self.assertRaises( AssertionError: (<class 'ValueError'>, <class 'LookupError'>) not raised by file_from_fd ---------------------------------------------------------------------- 

Maybe I should use a different string? Suggestions?

Copy link
Member

@vstinnervstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@vstinner
Copy link
Member

Address sanitizer build fails with:
FAIL: test_string_args_as_invalid_utf (test.test_capi.test_file.TestPyFile_FromFd.test_string_args_as_invalid_utf) (arg_pos=5)
AssertionError: (<class 'ValueError'>, <class 'LookupError'>) not raised by file_from_fd

It's unrelated to Address sanitizer. It's just that this CI builds Python is release mode. And in release mode, the error handler is only used if the string cannot be decoded (decoding error). In debug mode, the error handler is always checked.

You can skip this test if support.Py_DEBUG is false.

@vstinner
Copy link
Member

To reproduce the Address Sanitizer issue, I used:

./configure --with-address-sanitizer CC=clang ASAN_OPTIONS='detect_leaks=0:allocator_may_return_null=1:handle_segv=0' make ASAN_OPTIONS='detect_leaks=0:allocator_may_return_null=1:handle_segv=0' ./python -m test test_capi.test_file -v 

Copy link
Member

@serhiy-storchakaserhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have not finished the review yet. It is difficult with so many tests. So I can find other issues later.

The main problem is that they incorrectly create non-decodable files. You should use binary files to write them.

It would be nice also to reduce the number of lines where it is possible.

deftest_name_invalid_utf(self):
withopen(os_helper.TESTFN, "w", encoding="utf-8") asf:
file_obj=_testcapi.file_from_fd(
f.fileno(), "abc\xe9", "w",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not invalid UTF-8. When you pass the Python string, it is encoded to UTF-8, therefore the C string is always valid UTF-8. You have to pass a bytes object, e.g. b'\xff'. See for example tests for PyDict_GetItemString() or PyObject_GetAttrString().

Comment on lines 125 to 127
first_line="\xc3\x28\n"
withopen(os_helper.TESTFN, "w", encoding="utf-8") asf:
f.writelines([first_line])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, it does not create invalid UTF-8.

Comment on lines +147 to +148
withopen(os_helper.TESTFN, "w", encoding="utf-8") asf:
f.writelines([first_line, second_line])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many tests can use StringIO. E.g.

f=io.StringIO('first_line\nsecond_line\n')

Copy link
MemberAuthor

@sobolevnsobolevnSep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have explicit tests for both file object and io.StringIO:

deftest_file_get_multiple_lines(self): first_line="text with юникод 统一码\n"second_line="second line\n"withopen(os_helper.TESTFN, "w", encoding="utf-8") asf: f.writelines([first_line, second_line]) withopen(os_helper.TESTFN, encoding="utf-8") asf: self.assertEqual(self.get_line(f, 0), first_line) self.assertEqual(self.get_line(f, 0), second_line) deftest_file_get_line_from_file_like(self): first_line="text with юникод 统一码\n"second_line="second line\n"contents=io.StringIO(f"{first_line}{second_line}") self.assertEqual(self.get_line(contents, 0), first_line) self.assertEqual(self.get_line(contents, 0), second_line)

@vstinner
Copy link
Member

@sobolevn: What's the status of this PR? Do you plan to attempt to address @serhiy-storchaka's latest review?

@sobolevn
Copy link
MemberAuthor

yes, sure! adding this to my queue.

@sobolevn
Copy link
MemberAuthor

@serhiy-storchaka@vstinner I partially addressed your review. The only part that I didn't implement is invalid utf8 tests. I want to ask for advice on how it should be done.

For example, right now I cannot pass bytes to _testcapi.file_from_fd in test_string_args_as_invalid_utf. Because it parses args as:

if (!PyArg_ParseTuple(args, "izzizzzi", &fd, &name, &mode, &buffering, &encoding, &errors, &newline, &closefd)){returnNULL}

What is the best way to pass bytes here? Create one more function like:

staticPyObject*file_from_fd_with_bytes(PyObject*Py_UNUSED(self), PyObject*args)

and allow passing bytes there?

@serhiy-storchaka
Copy link
Member

What are your issues with passing a bytes object?

raiseValueError("str raised")

withself.assertRaisesRegex(ValueError, "str raised"):
self.write_and_return(StrRaises(), flags=_testcapi.Py_PRINT_RAW)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear what is the difference between these tests if it raises in any case. You should either define __str__ and __repr__ that do not raise in corresponding classes and test both classes with and without Py_PRINT_RAW, or just make both __str__ and __repr__ in the same class raising different exceptions and test that writing with and without Py_PRINT_RAW gives different errors. The former option will duplicate other tests, so I suggest the later way.

Oh, and you do not need to use write_and_return here.

self.assertRaises(AttributeError, self.write, NULL, object(), 0)
self.assertRaises(TypeError, self.write, NULL, NULL, 0)
wr=self.write
self.assertRaises(TypeError, wr, object(), io.BytesIO(), 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a string instead of object(). It will be clearer what you write and why this fails.

@vstinner
Copy link
Member

vstinner commented Jan 30, 2025

Oh no, I did it again :-( I forgot about this PR and I wrote a new one (that I just merged): #129449. Sorry about that. It seems like this PR has more tests.

@sobolevn
Copy link
MemberAuthor

@vstinner thanks a lot for your PR, I forgot about that one several times already :)

You can port some of the tests from here to your version if it helps.
Thanks for the reviews, @serhiy-storchaka! 👍

@vstinner
Copy link
Member

I will try to add tests from this PR.

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

@sobolevn@serhiy-storchaka@vstinner@skirpichev