Skip to content

Conversation

@erlend-aasland
Copy link
Contributor

@erlend-aaslanderlend-aasland commented May 31, 2021

The sqlite3 module now fully implements the GC protocol; there's no
need for this workaround anymore.

https://bugs.python.org/issue42213

@erlend-aasland
Copy link
ContributorAuthor

Pablo/Victor: Whenever you have time 🙏🏻

@erlend-aasland
Copy link
ContributorAuthor

I'm not sure if this should be backported to 3.10. It's just a cleanup, so I guess it's fine to backport it, but on the other hand there is no harm in not backporting it. I have no strong preference here.

@erlend-aaslanderlend-aasland marked this pull request as draft May 31, 2021 09:54
@erlend-aaslanderlend-aasland changed the title bpo-42213: Remove redundant cyclic GC hack in sqlite3[WIP] bpo-42213: Remove redundant cyclic GC hack in sqlite3May 31, 2021
@erlend-aasland
Copy link
ContributorAuthor

erlend-aasland commented May 31, 2021

The Windows failures looks very similar to the GC issues in #24203.
The macOS failure is a spurious ssl failure.

https://github.com/python/cpython/pull/26462/checks?check_run_id=2709446243

@erlend-aasland
Copy link
ContributorAuthor

erlend-aasland commented May 31, 2021

Adding 209196eaaa44d02551dad00516ff32fb447c1d8f fixes the test issues on Windows.

@vstinner, what do you make of this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the falls to GC collect here? These should not be necessary

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know; it's the same issue as in #24203. I'll try my best to reproduce this on my Mac.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, discovered #26475 while debugging this, but it does unfortunately not fix this issue.

The sqlite3 module now fully implements the GC protocol, so there's no need for this workaround anymore.
@erlend-aasland
Copy link
ContributorAuthor

FYI, rebased onto main.

@vstinner
Copy link
Member

Did you run ./python -m test test_sqlite -R 3:3 to check for reference leaks?

@vstinner
Copy link
Member

Windows (x64) CI job failed:

 2021-06-01T11:02:07.4501179Z ====================================================================== 2021-06-01T11:02:07.4503272Z ERROR: test_open_uri (sqlite3.test.dbapi.ConnectionTests) 2021-06-01T11:02:07.4581189Z ---------------------------------------------------------------------- 2021-06-01T11:02:07.4598093Z Traceback (most recent call last): 2021-06-01T11:02:07.4600441Z File "D:\a\cpython\cpython\lib\test\support\os_helper.py", line 238, in unlink 2021-06-01T11:02:07.4633846Z _unlink(filename) 2021-06-01T11:02:07.4636229Z File "D:\a\cpython\cpython\lib\test\support\os_helper.py", line 278, in _unlink 2021-06-01T11:02:07.4637841Z _waitfor(os.unlink, filename) 2021-06-01T11:02:07.4639071Z File "D:\a\cpython\cpython\lib\test\support\os_helper.py", line 246, in _waitfor 2021-06-01T11:02:07.4640337Z func(pathname) 2021-06-01T11:02:07.4642275Z PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: '@test_1508_tmp�' 2021-06-01T11:02:07.4644053Z 2021-06-01T11:02:07.4645203Z ====================================================================== 2021-06-01T11:02:07.4646483Z ERROR: test_open_with_path_like_object (sqlite3.test.dbapi.ConnectionTests) 2021-06-01T11:02:07.4648280Z Checks that we can successfully connect to a database using an object that 2021-06-01T11:02:07.4650124Z ---------------------------------------------------------------------- 2021-06-01T11:02:07.4652596Z Traceback (most recent call last): 2021-06-01T11:02:07.4655019Z File "D:\a\cpython\cpython\lib\sqlite3\test\dbapi.py", line 182, in test_open_with_path_like_object 2021-06-01T11:02:07.4670040Z cx.execute('create table test(id integer)') 2021-06-01T11:02:07.4672512Z sqlite3.OperationalError: table test already exists 2021-06-01T11:02:07.4688676Z 2021-06-01T11:02:07.4708289Z ====================================================================== 2021-06-01T11:02:07.4724689Z ERROR: test_open_with_path_like_object (sqlite3.test.dbapi.ConnectionTests) 2021-06-01T11:02:07.4728000Z Checks that we can successfully connect to a database using an object that 2021-06-01T11:02:07.4811284Z ---------------------------------------------------------------------- 2021-06-01T11:02:07.4831832Z Traceback (most recent call last): 2021-06-01T11:02:07.4834116Z File "D:\a\cpython\cpython\lib\test\support\os_helper.py", line 238, in unlink 2021-06-01T11:02:07.4864854Z _unlink(filename) 2021-06-01T11:02:07.4867085Z File "D:\a\cpython\cpython\lib\test\support\os_helper.py", line 278, in _unlink 2021-06-01T11:02:07.4868728Z _waitfor(os.unlink, filename) 2021-06-01T11:02:07.4870312Z File "D:\a\cpython\cpython\lib\test\support\os_helper.py", line 246, in _waitfor 2021-06-01T11:02:07.4871577Z func(pathname) 2021-06-01T11:02:07.4873516Z PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: '@test_1508_tmp�' 2021-06-01T11:02:07.4875042Z 2021-06-01T11:02:07.4876184Z ====================================================================== 2021-06-01T11:02:07.4877721Z ERROR: test_trace_callback_content (sqlite3.test.hooks.TraceCallbackTests) 2021-06-01T11:02:07.4879746Z ---------------------------------------------------------------------- 2021-06-01T11:02:07.4881969Z Traceback (most recent call last): 2021-06-01T11:02:07.4884465Z File "D:\a\cpython\cpython\lib\test\support\os_helper.py", line 238, in unlink 2021-06-01T11:02:07.4898652Z _unlink(filename) 2021-06-01T11:02:07.4900840Z File "D:\a\cpython\cpython\lib\test\support\os_helper.py", line 278, in _unlink 2021-06-01T11:02:07.4931605Z _waitfor(os.unlink, filename) 2021-06-01T11:02:07.4947677Z File "D:\a\cpython\cpython\lib\test\support\os_helper.py", line 246, in _waitfor 2021-06-01T11:02:07.4949998Z func(pathname) 2021-06-01T11:02:07.5029259Z PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: '@test_1508_tmp�' 2021-06-01T11:02:07.5046117Z 2021-06-01T11:02:07.5047969Z ---------------------------------------------------------------------- 

@vstinner
Copy link
Member

Ah, there is also a warning on Windows (same CI job):

2021-06-01T11:11:08.6974894Z Warning -- files was modified by test_sqlite 2021-06-01T11:11:08.6975521Z Before: [] 2021-06-01T11:11:08.6977644Z After: ['@test_1748_tmp�'] 

@erlend-aasland
Copy link
ContributorAuthor

Did you run ./python -m test test_sqlite -R 3:3 to check for reference leaks?

Yes, I did :)

@erlend-aasland
Copy link
ContributorAuthor

Windows (x64) CI job failed:

Yes, this is the reason for the draft/WIP status. On Windows, the GC uses a lot more time to break the ref. cycle between the statement cache and the connection. From what I can see, it seems that it just loops longer; there's a load of traverse calls before the cycle is broken. On Mac and Linux, the cycle is broken much faster. This results in the objects living slightly longer on Windows, so sqlite3 is clinging on to TESTFN, preventing unlink from removing it (as seen in the first traceback). Calling unlink repeatedly until it succeeds works, because eventually GC manages to clean up, and sqlite3 releases the file. It seems that the GC just behaves differently on Windows.

@vstinner
Copy link
Member

In general, I dislike relying on implicit resource management. I would prefer to emit a ResourceWarning if a resource is not released explicitly (by calling a close() method for example). Maybe some tests should call the close() method explicitly?

@erlend-aasland
Copy link
ContributorAuthor

erlend-aasland commented Jun 2, 2021

Maybe some tests should call the close() method explicitly?

That's the workaround I've added in #24203: see d2078bc.

It's exactly the same problem:

The LRU statement cache (pysqlite_Connection.statement_cache) has a "back-reference" to sqlite3.Connection (pysqlite_Cache.factory), because statement creation happens in sqlite3.Connection.__call__. I find that design strange and unfortunate.

Previously (or should I say currently), sqlite3.Connection.__init__ decrefs self after creating the LRU cache, and then tells sqlite3.Cache to not decref the statement factory (pysqlite_Cache.factory) when the cache is deallocated. In practice the cache is using a borrowed ref. to the connection.

@vstinner
Copy link
Member

Keeping Python objects in memory is fine. What is not fine is to hold an OS release, like a file. Do you mean that calling close() does not close the file on disk?

@erlend-aasland
Copy link
ContributorAuthor

Keeping Python objects in memory is fine. What is not fine is to hold an OS release, like a file. Do you mean that calling close() does not close the file on disk?

It does close it, but not until GC is done.

@vstinner
Copy link
Member

with sqlite.connect(path) as cx: doesn't close the connection at the end of the with block? Is it a deliberate behavior?

@erlend-aasland
Copy link
ContributorAuthor

erlend-aasland commented Jun 2, 2021

with sqlite.connect(path) as cx: doesn't close the connection at the end of the with block? Is it a deliberate behavior?

__exit__ does not close the connection, it only tries to commit or rollback what happened inside it:

  • https://docs.python.org/3/library/sqlite3.html#using-the-connection-as-a-context-manager
  • /*[clinic input]
    _sqlite3.Connection.__exit__ as pysqlite_connection_exit
    type as exc_type: object
    value as exc_value: object
    traceback as exc_tb: object
    /
    Called when the connection is used as a context manager.
    If there was any exception, a rollback takes place; otherwise we commit.
    [clinic start generated code]*/
    staticPyObject*
    pysqlite_connection_exit_impl(pysqlite_Connection*self, PyObject*exc_type,
    PyObject*exc_value, PyObject*exc_tb)
    /*[clinic end generated code: output=0705200e9321202a input=bd66f1532c9c54a7]*/
    {
    constchar*method_name;
    PyObject*result;
    if (exc_type==Py_None&&exc_value==Py_None&&exc_tb==Py_None){
    method_name="commit";
    } else{
    method_name="rollback";
    }
    result=PyObject_CallMethod((PyObject*)self, method_name, NULL);
    if (!result){
    returnNULL;
    }
    Py_DECREF(result);
    Py_RETURN_FALSE;
    }

The connection object is decref'ed after __exit__ (AFAICS), which closes the conneciton. But, arriving at connection dealloc takes a lot of time because of the garbage collector using a lot of time to break the ref. cycle. Maybe I'm terrible at explaining this, but what's happening (as I see it) is:

  1. __enter__
  2. do stuff
  3. __exit__ => triggers connection clean up
  4. traverse loop (ref. cycle)
  5. unlink => fails, because GC is still traversing
  6. traverse done => connection dealloc/clean => sqlite3 finally releases TESTFN, but it's too late

My computer time today is very fragmented, so my replies will be short and hopefully not too messy :)

@vstinner
Copy link
Member

exit does not close the connection, it only tries to commit or rollback what happened inside it

Well, again, IMO sqlite3 must behave as the io module: emit a ResourceWarning if a connection is not closed explicitly.

All tests must explicitly close the connection.

Closing a file must not depend if a sqlite object is part of a reference cycle or not. There are too many ways to create reference cycles in Python.

@erlend-aasland
Copy link
ContributorAuthor

Well, again, IMO sqlite3 must behave as the io module: emit a ResourceWarning if a connection is not closed explicitly.

All tests must explicitly close the connection.

Closing a file must not depend if a sqlite object is part of a reference cycle or not. There are too many ways to create reference cycles in Python.

In that case, I believe explicitly closing the connection and explicitly triggering gc.collect is the best thing to do here.

@pablogsal
Copy link
Member

pablogsal commented Jun 2, 2021

NOBODY EXPECTS THE SPANISH INQUISITION (I'm Spanish so I can say that :) )

Spanish_Inquisition

I think there is some confusion between @vstinner and @erlend-aasland. Let me try to clarify what are the expectations so is easier to reconduct the discussion:

  • It seems that the object calls close on destruction.
  • Being in a gc cycle prevents destruction.
  • The context manager doesn't close the connection.

One of the things that @vstinner is saying (and I very much agree with) is that we should never depend on calling gc.collect() to clean resources in the test suite. Is very unreliable and is almost always hiding bugs. For instance, in this case as the connexion is closed on destruction, is preferable to schedule a cleanup method (calling close() on the connection) but not relying on it being closed on destruction, as that is implicit resource management, which is bad.

Furthermore, @vstinner thinks that this implicit resource management (closing on destruction) is so bad that a warning should be emmited if we ever reach that code with the connection open, and all users should explicitly call close() or alternative we should change the context manager to close the connection, which is probably backwards compact.

In short:

  • Don't use gc.collect() in the test suite to do cleanups.
  • We should think how to ensure the connection is closed using explicit resource management.
  • We should understand:
    • Why there is a cycle in the first place
    • Why the test is hanging (this should not happen).

@vstinner
Copy link
Member

I understand that it is possible that the database file remains open after sqlite3_close_v2() is called, if some other sqlite objects still exist. Oh, this is counter intuitive. How can a programmer know if a database file is closed or not? Try to delete it on Windows? :-)

Would it help to emit a warning or even an hard exception if this case happen?

@erlend-aasland
Copy link
ContributorAuthor

Not the Spanish inquisition!!! 😂 Thanks, @pablogsal! :) I believe we're getting closer to understanding the various issues.

Victor:

Oh, this is counter intuitive. How can a programmer know if a database file is closed or not? Try to delete it on Windows? :-)

Yes, it is counter-intuitive, and no, there's no API for detecting if a database file is closed (AFAIK). fstat? :)

@vstinner
Copy link
Member

I understand that the close() method must destroy/release/close all sqlite objects before calling sqlite3_close_v2().

Said differently, would it be possible to implement a public or private close() method on each Python sqlite object which somehow prevents to close a database (sqlite3_close_v2), to release the resources, without having to destroy the Python object?

See my file object: even if the Python object is not destroyed, the inner resource is released. It requires all file methods to raise an exception if the file is closed.

@erlend-aasland
Copy link
ContributorAuthor

Anyway, I must take a break now; back at the computer in 5/6 hours :)

@vstinner
Copy link
Member

Anyway, I must take a break now; back at the computer in 5/6 hours :)

The Python project is developed asynchronously. You don't have to reply in less than 1 hour. It's perfectly fine to take 1 week or even 1 month to reply to a review. Your message sends notifications which are counter-productive. I go to the PR to see the comment... to say that you will reply later, ah. On IRC, I only get a notification that you wrote a comment, but I don't get the content.

Erlend E. Aasland added 2 commits June 2, 2021 20:32
- add wrapper for sqlite3_close_v2() - explicitly free pending statements before close()
@erlend-aaslanderlend-aasland marked this pull request as ready for review June 2, 2021 19:02
@erlend-aasland
Copy link
ContributorAuthor

Looks like I got it right. PTAL, @vstinner & @pablogsal. Thanks for your patience and guidance.

@erlend-aaslanderlend-aasland changed the title [WIP] bpo-42213: Remove redundant cyclic GC hack in sqlite3bpo-42213: Remove redundant cyclic GC hack in sqlite3Jun 2, 2021
* up cursors, as they may have strong refs to statements. */
Py_CLEAR(self->statement_cache);
Py_CLEAR(self->statements);
Py_CLEAR(self->cursors);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you use the connection after you call close()? Are we checking for NULL for these veriables everywhere else? (Cannot check myself as I'm on my phone)

Copy link
ContributorAuthor

@erlend-aaslanderlend-aaslandJun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll get a sqlite3.ProgrammingError. There's a lot of sanity checks in the code for this. The sqlite3 test suite runs fine, but of course, we do not have 100% code coverage yet. (Digression: see issue 43553 for improving sqlite3 code coverage.)

After close, if you try to use a cursor, fetch from a pending statement, or manipulate the connection, you'll get a sqlite3.ProgrammingError:

>>> cx = sqlite3.connect(":memory:") >>> cu = cx.cursor() >>> res = cu.execute("select 1") >>> cx.close() >>> res.fetchall() Traceback (most recent call last): File "<stdin>", line 1, in <module> sqlite3.ProgrammingError: Cannot operate on a closed database. >>> cu.execute("select 1") Traceback (most recent call last): File "<stdin>", line 1, in <module> sqlite3.ProgrammingError: Cannot operate on a closed database. >>> cx.create_function("test", 1, lambda x: x) Traceback (most recent call last): File "<stdin>", line 1, in <module> sqlite3.ProgrammingError: Cannot operate on a closed database. >>> 

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more tests wrt. this: 1b23df1
Also, there's some regression tests that exercise operations on closed connections (and closed cursors).

Let me know if you want me to add more tests.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a strange observation:
I did a ref leak test on Windows yesterday, and it segfaulted on test_bpo31770. After some testing, I reverted the change highlighted in this conversation; I removed the three Py_CLEAR, added pysqlite_do_all_statements(self, ACTION_FINALIZE, 1) back, but kept connection_close(). That fixed the ref leak test on Windows, and the test suite still worked; it did not "leak" open test database files. So, for the new PR, I'll keep this modification. We'll see how the CI fares :)

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI seems to be fine with this: erlend-aasland#9

@vstinner
Copy link
Member

_sqlite3.Connection.__enter__() doesn't call pysqlite_check_connection() (check if the DB is open). IMO it should.

The following methods check indirectly if the connection is closed.

*_sqlite3.Connection.execute(): call self.cursor().execute()

  • _sqlite3.Connection.executemany(): call self.cursor().executemany()
  • _sqlite3.Connection.executescript(): call self.cursor().executescript()
  • _sqlite3.Connection.__exit__(): call commit() or rollback()

Other methods call pysqlite_check_connection().

Copy link
Member

@vstinnervstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum. Can you try to split your PR in two parts? First PR to rewrite the C code, second PR to rewrite the tests.

@erlend-aasland
Copy link
ContributorAuthor

_sqlite3.Connection.__enter__() doesn't call pysqlite_check_connection() (check if the DB is open). IMO it should.

Fixed in 7d8014a.

Hum. Can you try to split your PR in two parts? First PR to rewrite the C code, second PR to rewrite the tests.

Yes, I could do that, but that would make the CI fail on the first PR.

...and sort imports
@vstinner
Copy link
Member

Yes, I could do that, but that would make the CI fail on the first PR.

Hum, I am not strictly thinking about C vs Python, but more something like that:

First PR:

  • Add Py_CLEAR in pysqlite_connection_close_impl
  • Add connection_close()
  • pysqlite_connection_enter_impl() change
  • unit tests for closed connections/cursors

Second PR:

  • Remove decref_factory
  • Remove gc.collect() calls
  • Add managed_connect() in test

@vstinner
Copy link
Member

This PR is no longer only about "Remove redundant cyclic GC hack in sqlite3" and "The sqlite3 module now fully implements the GC protocol; there's no need for this workaround anymore." It's now way more than that.

@erlend-aasland
Copy link
ContributorAuthor

erlend-aasland commented Jun 2, 2021

True. Can I re-use the same issue number for both PRs?

@erlend-aaslanderlend-aasland deleted the sqlite-remove-gc-hack branch June 2, 2021 22:33
@vstinner
Copy link
Member

Can I use the same issue number?

Yes. It's a good practice to put related changes in the same bpo.

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

@erlend-aasland@vstinner@pablogsal@the-knights-who-say-ni@bedevere-bot