Skip to content

Conversation

@merito
Copy link
Contributor

@meritomerito commented Apr 24, 2021

Fixes#651

Description of the Change

Use batches when deleting expired tokens using cleartokens management command. Introduce two setting variables to configure batch size and interval between deletions. The defaults were tested on quite weak machines and ~1.5M deleted tokens. It allows to run cleartokens without any downtime.

Checklist

  • PR only contains one change (considered splitting up PR)
  • unit-test added
  • documentation updated
  • CHANGELOG.md updated (only for user relevant changes)
  • author name in AUTHORS

@meritomerito marked this pull request as ready for review April 24, 2021 23:03
@codecov
Copy link

codecovbot commented Apr 25, 2021

Codecov Report

Merging #969 (725c3c9) into master (e4c98c7) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@ Coverage Diff @@## master #969 +/- ## ========================================== + Coverage 96.64% 96.67% +0.03%  ========================================== Files 31 31 Lines 1756 1775 +19 ========================================== + Hits 1697 1716 +19  Misses 59 59 
Impacted FilesCoverage Δ
oauth2_provider/settings.py100.00% <ø> (ø)
oauth2_provider/models.py98.76% <100.00%> (+0.07%)⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4c98c7...725c3c9. Read the comment docs.

auvipy
auvipy previously requested changes Oct 19, 2021
Copy link
Contributor

@auvipyauvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit and integration tests for the changed logics are needed. also need explanation why atomic transacion is not needed here.

@MattBlack85
Copy link
Contributor

I am a bit worried about the performance that this may have on large postrgesql databases because of the .count() and id__in, can we measure and report a run against a big table to see how it performs?

@merito
Copy link
ContributorAuthor

merito commented Oct 22, 2021

I am a bit worried about the performance that this may have on large postrgesql databases because of the .count() and id__in, can we measure and report a run against a big table to see how it performs?

Note: If you only need to determine the number of records in the set (and don’t need the actual objects), it’s much more efficient to handle a count at the database level using SQL’s SELECT COUNT(*). Django provides a count() method for precisely this reason.

source: https://docs.djangoproject.com/en/3.2/ref/models/querysets/#when-querysets-are-evaluated

Performance is the reason why this PR emerges. I've tried to remove ~1.5M of stale tokens and I've failed because very high RAM usage. When I'd introduced batching it passed without any peaks in resources of machine on postgres installed on very basic AWS instance. Because AccessToken model is simple the default 10000 items batch size is the best balance between speed and low resource usage.

Copy link
Contributor

@n2ygkn2ygk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed follow up on this PR. It looks good. Please rebase and resolve conflicts. I'm targeting the next minor release for this. Thanks! (I'm afraid to check our production DB to see how many expired tokens are in it;-)

@n2ygkn2ygk added this to the 1.7.0 milestone Dec 22, 2021
Do not check for merge conflicts in AUTHORS file, because ======= in 2nd line triggers the error.
@meritomeritoforce-pushed the feature-improve-cleartokens-performance branch from e4428bc to 2340bdfCompareDecember 22, 2021 17:14
@merito
Copy link
ContributorAuthor

I had to exclude AUTHORS from check-merge-conflict, because it causes

Check for merge conflicts................................................Failed - hook id: check-merge-conflict - exit code: 1 Merge conflict string "======= " found in AUTHORS:2 

You can keep this change or change separator in AUTHORS file.

@meritomerito requested a review from auvipyDecember 22, 2021 17:33
@n2ygk
Copy link
Contributor

I think I'll approach this by fixing AUTHORS separately as the problem doesn't "belong" to this PR.

Also, since I'm targeting 1.7.0 for this as it's more than a patch, let's hold it off a bit until 1.6.1. gets published.

@n2ygk
Copy link
Contributor

pre-commit/pre-commit-hooks#100 describes the problems with RST underline.

@n2ygkn2ygk dismissed auvipy’s stale reviewJanuary 1, 2022 16:48

change made but not marked resolved.

@n2ygkn2ygk merged commit c42423c into django-oauth:masterJan 1, 2022
@n2ygk
Copy link
Contributor

n2ygk commented Jan 1, 2022

@auvipy no test existed prior to this PR so let's accept it as is and put writing a test for the command on the backlog.

# Should only be required in testing.
"ALWAYS_RELOAD_OAUTHLIB_CORE": False,
"CLEAR_EXPIRED_TOKENS_BATCH_SIZE": 10000,
"CLEAR_EXPIRED_TOKENS_BATCH_INTERVAL": 0.1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@merito Sorry this is after the fact but wouldn't a default value of 0 be best, especially since the sleep is always executed even if the batch is tiny.
https://github.com/merito/django-oauth-toolkit/blob/725c3c9d8927379c9808abd1badb4fcd9ff1cbaa/oauth2_provider/models.py#L636

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cleartokens management command could have some performance improvements

5 participants

@merito@MattBlack85@n2ygk@auvipy@dawidwolski-identt