feat: optimizing the prune function at the apriori_algorithm.py archive#12992
+7 −1
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe your change:
Added an optimized version of the
prunefunction usingCounterto improve performancewhen checking candidate itemsets for frequent items.
I used as a test base a gradually increasing size of the
itemsetlist to demonstratethe inefficiency of the original algorithm, which had a complexity of O(n * c * i),
where n is the size of
itemset, c is the number of candidates, and i is the number ofitems in each candidate.
The new solution reduces the complexity to O(n + c * i). Previously, the algorithm would
iterate over
itemset(O(n)) and count occurrences for each item (O(n)) every time itneeded to check a candidate, resulting in repeated costly operations.
To optimize this, I used an auxiliary dictionary (via
Counter) where each key is anitem and its value is the number of occurrences in
itemset. This allows both the checkand count operations to be performed in constant time O(1).
As a result, the performance improvement is significant, at the cost of a small additional
memory usage, which is a worthwhile trade-off. This improvement can be observed by
comparing the execution of both algorithms (as shown in the attached image).
Here is the graph comparing both functions:
pruneOptimized_prune_algoritm_results.pdf
Unit tests were also conducted on my local machine to ensure the consistency of results between the two methods, but they are not included in this PR.
Checklist: