Fix livelock during shutdown and downsizing.#1

ambrop72 · 2020-05-22T18:04:12Z

When destruction starts or the pool size is reduced, only the last worker thread could in fact stop, while the other workers that should stop would enter a busy wait state, since the condition they are waiting for via a condition variable would be satisfied. The busy waiting threads could starve the last thread from CPU, delaying downsizing or shutdown. This is particularly noticable when running via Valgrind; shutdown delays in the order of tens of seconds have been observed.

Fix it by making each thread stop as soon as it can. Doing this required adjusting the management of the workers vector:

When a worker is stopping, it may not yet be possible to downsize the vector (if it is not the last worker). In that case an invalid thread entry is left in the vector, and the vector will be downsized when possible as other threads stop.
When upsizing, some threads above the current limit may still be running (if there was a recent downsizing and they haven't stopped yet). Such threads just keep running.

When destruction starts or the pool size is reduced, only the last worker thread could in fact stop, while the other workers that should stop would enter a busy wait state, since the condition they are waiting for via a condition variable would be satisfied. The busy waiting threads could starve the last thread from CPU, delaying downsizing or shutdown. This is particularly noticable when running via Valgrind; shutdown delays in the order of tens of seconds have been observed. Fix it by making each thread stop as soon as it can. Doing this required adjusting the management of the workers vector: - When a worker is stopping, it may not yet be possible to downsize the vector (if it is not the last worker). In that case an invalid thread entry is left in the vector, and the vector will be downsized when possible as other threads stop. - When upsizing, some threads above the current limit may still be running (if there was a recent downsizing and they haven't stopped yet). Such threads just keep running.

wilx · 2020-05-29T18:03:58Z

Do you have a test case that shows how it fails and how your patch fixes it? (Preferably independent of log4cplus.) I think I see it but it would be nice to have a test case.

ambrop72 · 2020-05-29T19:44:45Z

#include "ThreadPool.h" int main(){progschj::ThreadPool pool}

g++ example.cpp -o example -lpthread

$ time valgrind ./example ==31188== Memcheck, a memory error detector ==31188== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==31188== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info ==31188== Command: ./example ==31188== (gets stuck eating CPU, may terminate if given enough time) (CTRL+C to see a back stack trace) ==31188== Process terminating with default action of signal 2 (SIGINT) ==31188== at 0x4027D7: std::deque<std::function<void ()>, std::allocator<std::function<void ()> > >::empty() const (in /home/ambro/ThreadPool/example) ==31188== by 0x402187: std::queue<std::function<void ()>, std::deque<std::function<void ()>, std::allocator<std::function<void ()> > > >::empty() const (in /home/ambro/ThreadPool/example) ==31188== by 0x401A9F: progschj::ThreadPool::emplace_back_worker(unsigned long)::{lambda()#1}::operator()() const (in /home/ambro/ThreadPool/example) ==31188== by 0x40486B: void std::__invoke_impl<void, progschj::ThreadPool::emplace_back_worker(unsigned long)::{lambda()#1}>(std::__invoke_other, progschj::ThreadPool::emplace_back_worker(unsigned long)::{lambda()#1}&&) (in /home/ambro/ThreadPool/example) ==31188== by 0x404820: std::__invoke_result<progschj::ThreadPool::emplace_back_worker(unsigned long)::{lambda()#1}>::type std::__invoke<progschj::ThreadPool::emplace_back_worker(unsigned long)::{lambda()#1}>(std::__invoke_result&&, (progschj::ThreadPool::emplace_back_worker(unsigned long)::{lambda()#1}&&)...) (in /home/ambro/ThreadPool/example) ==31188== by 0x4047CD: void std::thread::_Invoker<std::tuple<progschj::ThreadPool::emplace_back_worker(unsigned long)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (in /home/ambro/ThreadPool/example) ==31188== by 0x4047A3: std::thread::_Invoker<std::tuple<progschj::ThreadPool::emplace_back_worker(unsigned long)::{lambda()#1}> >::operator()() (in /home/ambro/ThreadPool/example) ==31188== by 0x404787: std::thread::_State_impl<std::thread::_Invoker<std::tuple<progschj::ThreadPool::emplace_back_worker(unsigned long)::{lambda()#1}> > >::_M_run() (in /home/ambro/ThreadPool/example) ==31188== by 0x493BB3F: ??? (in /nix/store/w7alid4mirzwx3ck4hj18q7rnr4yslfh-gcc-9.2.0-lib/lib/libstdc++.so.6.0.27) ==31188== by 0x484BEDC: start_thread (in /nix/store/9hy6c2hv8lcwc6clnc1p2jf09cs5q9dp-glibc-2.30/lib/libpthread-2.30.so) ==31188== by 0x4C98A4E: clone (in /nix/store/9hy6c2hv8lcwc6clnc1p2jf09cs5q9dp-glibc-2.30/lib/libc-2.30.so)

With the fix, terminates normally in about a second.

Patch by: ambrop72

C++17 -ify

wilx self-assigned this May 22, 2020

wilx requested review from wilx and removed request for wilx May 29, 2020 17:57

wilx added the bug label May 29, 2020

wilx merged commit fc20cee into log4cplus:masterMay 29, 2020

wilx added a commit to log4cplus/log4cplus that referenced this pull request May 29, 2020
Update threadpool. Thisfixeslog4cplus/ThreadPool#1.
62cc851
Patch by: ambrop72

wilx added a commit to log4cplus/log4cplus that referenced this pull request May 29, 2020
Update threadpool. Thisfixeslog4cplus/ThreadPool#1.
76ff7e6
Patch by: ambrop72

fwsGonzo pushed a commit to fwsGonzo/ThreadPool that referenced this pull request Jan 10, 2023
Merge pull requestlog4cplus#1from HO-COOH/HO-COOH-patch-1
e5972b7
C++17 -ify

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix livelock during shutdown and downsizing.#1

Fix livelock during shutdown and downsizing. #1

Uh oh!

ambrop72 commented May 22, 2020

Uh oh!

wilx commented May 29, 2020

Uh oh!

ambrop72 commented May 29, 2020•
edited
Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix livelock during shutdown and downsizing.#1

Fix livelock during shutdown and downsizing. #1

Uh oh!

Conversation

ambrop72 commented May 22, 2020

Uh oh!

wilx commented May 29, 2020

Uh oh!

ambrop72 commented May 29, 2020• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ambrop72 commented May 29, 2020•
edited
Loading