Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 33.9k
gh-108590: Fix sqlite3 iterdump() for table columns containing invalid Unicode sequences#108683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-108590: Fix sqlite3 iterdump() for table columns containing invalid Unicode sequences #108683
Uh oh!
There was an error while loading. Please reload this page.
Conversation
erlend-aasland commented Aug 30, 2023 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
…e sequences This also reverts 400a1ce.
erlend-aasland commented Aug 30, 2023
cc. @CorvinM |
erlend-aasland commented Aug 30, 2023
Sorry 'bout the unneeded churn, Serhiy; I should have caught this earlier. |
serhiy-storchaka left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still not good.
It modifies the input connection object, is not thread-safe, and even if iterdump() will no longer fail, you need a way to feed its result back to SQLite without errors. There will also be new encoding errors in attempt to save the dump in file or print to stdout.
serhiy-storchaka commented Aug 30, 2023
For now, it may be easier to revert the previous changes and start to work on proper fix from clear page. |
erlend-aasland commented Aug 30, 2023
Good point; Corvin added the context manager for this; let's throw that back in.
The underlying calls should take care of that; you cannot share a |
erlend-aasland commented Aug 30, 2023
I agree; I'm reverting the previous change in |
erlend-aasland commented Aug 30, 2023 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There is no encoding errors when |
CorvinM commented Aug 30, 2023 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Does the recreation from dump work here? I was encountering encoding errors when attempting it with surrogateescape but not with the old chr() for some reason. deftest_dump_recreation(self): self.cu.executescript(""" CREATE TABLE foo (id INTEGER, text TEXT, blob BLOB); INSERT INTO foo VALUES (0, CAST(X'619f' AS TEXT), X'619f'); INSERT INTO foo VALUES (1, 'Hello SQLite!', X'98194eff46ab29f79064'); """) original_dump=list(self.cx.iterdump()) withmemory_database() ascx2: query="".join(original_dump) cx2.executescript(query) recreation_dump=list(cx2.iterdump()) self.assertEqual(original_dump, recreation_dump) |
erlend-aasland commented Aug 30, 2023
Well, using the So the fix is valid, technically. But from a user perspective; requiring the user to pass |
erlend-aasland commented Aug 30, 2023
Yes, I can recreate it with this: $ sqlite3 test.dbSQLite version 3.39.5 2022-10-14 20:58:05Enter ".help" for usage hints.sqlite> CREATE TABLE t(t TEXT);sqlite> INSERT INTO t VALUES(CAST(X'619f' AS TEXT));sqlite> ^D $ cat > test.pyimport sqlite3cx = sqlite3.connect("test.db")for line in cx.iterdump(): print(line)cx.close() $ ./python.exe test.py > dump1.sql $ sqlite3 test.db .dump > dump2.sql $ hexdump -C dump1.sql00000000 42 45 47 49 4e 20 54 52 41 4e 53 41 43 54 49 4f |BEGIN TRANSACTIO|00000010 4e 3b 0a 43 52 45 41 54 45 20 54 41 42 4c 45 20 |N;.CREATE TABLE |00000020 74 28 74 20 54 45 58 54 29 3b 0a 49 4e 53 45 52 |t(t TEXT);.INSER|00000030 54 20 49 4e 54 4f 20 22 74 22 20 56 41 4c 55 45 |T INTO "t" VALUE|00000040 53 28 27 61 9f 27 29 3b 0a 43 4f 4d 4d 49 54 3b |S('a.');.COMMIT;|00000050 0a |.|00000051 $ hexdump -C dump2.sql00000000 50 52 41 47 4d 41 20 66 6f 72 65 69 67 6e 5f 6b |PRAGMA foreign_k|00000010 65 79 73 3d 4f 46 46 3b 0a 42 45 47 49 4e 20 54 |eys=OFF;.BEGIN T|00000020 52 41 4e 53 41 43 54 49 4f 4e 3b 0a 43 52 45 41 |RANSACTION;.CREA|00000030 54 45 20 54 41 42 4c 45 20 74 28 74 20 54 45 58 |TE TABLE t(t TEX|00000040 54 29 3b 0a 49 4e 53 45 52 54 20 49 4e 54 4f 20 |T);.INSERT INTO |00000050 74 20 56 41 4c 55 45 53 28 27 61 9f 27 29 3b 0a |t VALUES('a.');.|00000060 43 4f 4d 4d 49 54 3b 0a |COMMIT;.|00000068Apart from the preamble and table name quoting, they're equal. The VALUES string is |
erlend-aasland commented Aug 30, 2023
|
CorvinM commented Aug 30, 2023
Unless I did something wrong I think this PR will fail the test I posted but works for print() What do you think about breaking consistency with sqlite cli for 3.11, 3.12 and just dumping |
erlend-aasland commented Aug 30, 2023
FTR, with #108695 and the recreation sequence outlined in #108683 (comment), I'm not able to recreate the database. The resulting VALUES string becomes: |
erlend-aasland commented Aug 30, 2023
Both correct.
I'm not sure if it is worth it for this corner case. At the moment, I think this may be best solved in documentation. Let's think about it for some days; there is no hurry in getting this fixed. |
erlend-aasland commented Aug 30, 2023
Thanks for helping investigate this, @CorvinM. |
This partially reverts 400a1ce, except for the test, which is amended.