- Notifications
You must be signed in to change notification settings - Fork 78
More thread-safe GC#529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More thread-safe GC #529
Uh oh!
There was an error while loading. Please reload this page.
Conversation
ericphanson commented Jul 24, 2024
BTW, could be worth trying this benchmark: main...ericphanson:PythonCall.jl:eph/timing |
cjdoris commented Jul 31, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
@ericphanson@lassepe@MilesCranmer I think this is the version I'm going to merge. I'd be grateful if you could give it a try to check it solves any issues, and also review the code. Some timings on my machine:
I just need to copy over the extra tests from #520. |
MilesCranmer commented Jul 31, 2024
PySR test suite is passing 👍 |
MilesCranmer commented Jul 31, 2024
Cool, this also fixes avik-pal/Wandb.jl#27! |
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
MilesCranmer left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the docs also need to be updated with the warning about multithreading (and suggestion that GC be turned off).
Also, how does this interact with the signal handling, does the warning in the docs still apply?
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
src/GC/GC.jl Outdated
| end | ||
| function unsafe_free_queue() | ||
| lock(QUEUE_LOCK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the lock(...) do ... API since it will still unlock in the case of an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I considered it but (a) it adds 30ns and (b) the code in between should not throw.
MilesCranmerAug 1, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's the additional function call from the closure. In that case I would try
lock(QUEUE_LOCK) try#= do stuff =#finally unlock(QUEUE_LOCK) endAnd hopefully that's a bit faster.
If anything it's just a good habit. I think there's always a possibility for some kind of error in Julia code (until the day we can actually impose these types of constraints on code), so locks should always be safely handled. e.g., perhaps the user could <Ctrl-C> while the lock is in place, or there's an OOM error, or something else we haven't thought of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW Base.@lock (which is what I used in my PR) expands to exactly the code you wrote there @MilesCranmer, and is exported on more recent julia versions
src/GC/GC.jl Outdated
| unsafe_free_queue() | ||
| end | ||
| else | ||
| lock(QUEUE_LOCK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer lock(...) do ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(As above)
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
ericphanson commented Aug 2, 2024
I think we need SpinLock since we are in a finalizer! We aren’t allowed to yield like ReentrantLock does |
MilesCranmer left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
ericphanson left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One out of date comment, otherwise lgtm (though I should say I’m not expert and was just learning as I went in #520)
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
MilesCranmer commented Aug 2, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Regarding the iterate(items) ===nothingThis is how Base checks for an array being empty. I think what can happen is: thread A can (1) call the In other words, use of Therefore I think all uses of Basically anytime |
cjdoris commented Aug 2, 2024
I commented elsewhere that you're looking at the wrong method of isempty(a::AbstractArray) = (length(a) ==0)which I believe is safe. It might give wrong answers occasionally in the presence of a race condition, but it's safe. |
MilesCranmer commented Aug 2, 2024
Oh yeah, you are totally right, sorry. |
MilesCranmer commented Aug 2, 2024
Ok, no other comments from me. I think it's ready to go 👍 |
cjdoris commented Aug 2, 2024
Thanks for your help reviewing! |
MilesCranmer commented Aug 2, 2024
No problem 👍 (By the way, could we get a release so we can fix the Wandb.jl issue?) |
cjdoris commented Aug 2, 2024
Yeah I'll make a release once #535 is merged (going to sleep on it and bikeshed the new function names). |
MilesCranmer commented Aug 3, 2024
Sounds good. In case #535 seems like it will take some more discussion to settle on an API I wonder if we could have a patch release just so we can get the bug fixes through? |
MilesCranmer commented Aug 6, 2024
cjdoris commented Aug 7, 2024
It's released |
MilesCranmer commented Aug 7, 2024
Thanks! |
A draft of an alternative to #520 to ensure Python GC only runs on thread 1.
This version avoids using tasks. Instead, when a Python object is finalized:
I also added
PythonCall.GC.gc()to explicitly clear the queue (if possible).Furthermore I added an immortal object
GCHook()which callsPythonCall.GC.gc()whenever it is finalized, then adds its own finalizer back on. This means the queue can be cleared periodically when Julia GC runs, because it keeps trying to finalizeGCHook().I used
PyGILState_Check() == 1to check that the GIL is currently held. This should make it safe to interact with Python from any thread in the future, provided you use the C APIPyGILState_*andPyThreadState_*functions correctly.Very much a draft. Need to use
Channels etc (like #520) to make this totally thread-safe.