server bench: fix bench not waiting for model load#7284

JohannesGaessler · 2024-05-14T13:34:52Z

While working on #6828 I noticed that when using a large static n-ngam cache the benchmark would report 0 iterations for the first 8 minutes and then 30 iterations for the last 2 minutes. What seems to be happening is that bench.py doesn't correctly wait for the server to be ready so the clock starts ticking even while the n-gram cache is still being loaded. From what I can tell loading the model from disk can have the same issue if it's e.g. on an HDD.

This PR makes it so that bench.py waits for response 200 (SERVER_STATE_READY) from the health endpoint for checking whether the server is actually ready. I'm not sure if there is a better way to implement this than what I did; I'm definitely open to suggestions.

ggerganov · 2024-05-16T14:42:21Z

It looks like this change causes the server Benchmark that we run on the self-hosted runner to fail like this:

https://github.com/ggerganov/llama.cpp/actions/runs/9094073377/job/24998422481

I tried to revert it and now the benchmark passes:

https://github.com/ggerganov/llama.cpp/actions/runs/9112533114

I'm not sure why it is causing the error - any ideas how to fix?

phymbert · 2024-05-16T18:09:05Z

Yes, the problem is here:

https://github.com/ggerganov/llama.cpp/blob/9afdffe70ebf3166d429b4434783bb0b7f97bdeb/examples/server/bench/bench.py#L113

It considers prometheus not started, which is not working as expected. Probably easier to revert and separate in another PR prometheus check vs llama.cpp server checks ?

This reverts commit 583fd6b.

…7334) This reverts commit 583fd6b.

server bench: fix bench not waiting for model load
f692dbd

JohannesGaessler requested a review from phymbert May 14, 2024 13:34

phymbert approved these changes May 14, 2024
View reviewed changes

mofosyne added examples Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix python python script changes labels May 14, 2024

JohannesGaessler merged commit 583fd6b into ggml-org:masterMay 15, 2024

phymbert added a commit that referenced this pull request May 16, 2024
Revert "server bench: fix bench not waiting for model load (#7284)"
e7f7bef
This reverts commit 583fd6b.

phymbert mentioned this pull request May 16, 2024
Revert "server bench: fix bench not waiting for model load" #7334
Merged

phymbert added a commit that referenced this pull request May 16, 2024
Revert "server bench: fix bench not waiting for model load (#7284)" (#…
24ecb58
…7334) This reverts commit 583fd6b.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server bench: fix bench not waiting for model load#7284

server bench: fix bench not waiting for model load #7284

Uh oh!

JohannesGaessler commented May 14, 2024

Uh oh!

ggerganov commented May 16, 2024

Uh oh!

phymbert commented May 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

server bench: fix bench not waiting for model load#7284

server bench: fix bench not waiting for model load #7284

Uh oh!

Conversation

JohannesGaessler commented May 14, 2024

Uh oh!

ggerganov commented May 16, 2024

Uh oh!

phymbert commented May 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants