feat: optimizing the prune function at the apriori_algorithm.py archive#12992

joaoneto9 · 2025-09-24T18:48:08Z

Describe your change:

Added an optimized version of the prune function using Counter to improve performance
when checking candidate itemsets for frequent items.

I used as a test base a gradually increasing size of the itemset list to demonstrate
the inefficiency of the original algorithm, which had a complexity of O(n * c * i),
where n is the size of itemset, c is the number of candidates, and i is the number of
items in each candidate.

The new solution reduces the complexity to O(n + c * i). Previously, the algorithm would
iterate over itemset (O(n)) and count occurrences for each item (O(n)) every time it
needed to check a candidate, resulting in repeated costly operations.

To optimize this, I used an auxiliary dictionary (via Counter) where each key is an
item and its value is the number of occurrences in itemset. This allows both the check
and count operations to be performed in constant time O(1).

As a result, the performance improvement is significant, at the cost of a small additional
memory usage, which is a worthwhile trade-off. This improvement can be observed by
comparing the execution of both algorithms (as shown in the attached image).

Here is the graph comparing both functions:
pruneOptimized_prune_algoritm_results.pdf

Unit tests were also conducted on my local machine to ensure the consistency of results between the two methods, but they are not included in this PR.

Add an algorithm?
Fix a bug or typo in an existing algorithm?
Add or change doctests? -- Note: Please avoid changing both code and tests in a single pull request.
Documentation change?

Checklist:

for more information, see https://pre-commit.ci

…cture

joaoneto9 · 2025-09-24T20:04:31Z

I hadn't realized that the itemset could be a list of lists. As a result, hashing these data structures was not possible, so I switched to using tuples, which are immutable, as keys for the Counter. After this change, I noticed a slight overhead, since each item now needs to be converted into a tuple to be checked within the Counter structure. Nonetheless, there is a significant efficiency gain in the worst-case scenario, and I believe it will also improve performance in average cases. I have not yet tested these other scenarios or generated their corresponding graphs. Below is the graph reflecting the new modification.

pruneOptimized_prune_algoritm_results.pdf

Copilot

Pull Request Overview

This PR optimizes the prune function in the Apriori algorithm implementation to improve performance when checking candidate itemsets. The optimization uses Counter to precompute item frequencies instead of repeatedly counting occurrences during candidate validation.

Key changes:

Replaces linear search and counting with hash-based lookup using Counter
Reduces time complexity from O(n * c * i) to O(n + c * i)
Updates function documentation to reflect the optimization

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-01T15:24:02Z

machine_learning/apriori_algorithm.py

 >>> prune(itemset, candidates, 3)
 []
 """
+itemset_counter=Counter(tuple(x) forxinitemset)


The tuple conversion is performed twice for the same data - once when creating the Counter and again when checking each item. Consider converting items to tuples consistently or using a different approach to avoid this duplication.

Copilot · 2025-10-01T15:24:02Z

machine_learning/apriori_algorithm.py

+tupla=tuple(item)
+iftuplanotinitemset_counteroritemset_counter[tupla] <length-1:


The tuple conversion is performed twice for the same data - once when creating the Counter and again when checking each item. Consider converting items to tuples consistently or using a different approach to avoid this duplication.

for more information, see https://pre-commit.ci

joaoneto9and others added 2 commits September 24, 2025 15:19

feat: optimizing the prune function at the apriori_algorithm.py archive
def174d

[pre-commit.ci] auto fixes from pre-commit.com hooks
c2d0613
for more information, see https://pre-commit.ci

algorithms-keeperbot added the tests are failing Do not merge until tests pass label Sep 24, 2025

joaoneto9 added 2 commits September 24, 2025 15:51

fix: fixing the unsorted importing statment
839c43a

Merge branch 'master' ofhttps://github.com/joaoneto9/Python
81a9d8d

algorithms-keeperbot added the awaiting reviews This PR is ready to be reviewed label Sep 24, 2025

pre-commit-cibotand others added 3 commits September 24, 2025 18:54

[pre-commit.ci] auto fixes from pre-commit.com hooks
38e849b
for more information, see https://pre-commit.ci

fix: fixing the key structure to a tuple that can be an hashable stru…
789f76d
…cture

Merge branch 'master' ofhttps://github.com/joaoneto9/Python
42fe4b6

algorithms-keeperbot removed tests are failing Do not merge until tests pass labels Sep 24, 2025

Merge branch 'master' into master
c88b71f

AnupKumarPanwar requested a review from Copilot October 1, 2025 15:23

AnupKumarPanwar approved these changes Oct 1, 2025
View reviewed changes

algorithms-keeperbot removed the awaiting reviews This PR is ready to be reviewed label Oct 1, 2025

CopilotAI reviewed Oct 1, 2025
View reviewed changes

Merge branch 'master' into master
30aa721

algorithms-keeperbot added the awaiting reviews This PR is ready to be reviewed label Oct 2, 2025

MaximSmolskiyand others added 2 commits October 20, 2025 00:11

Update apriori_algorithm.py
f005cc0

[pre-commit.ci] auto fixes from pre-commit.com hooks
2726165
for more information, see https://pre-commit.ci

MaximSmolskiy approved these changes Oct 19, 2025
View reviewed changes

MaximSmolskiy added 2 commits October 20, 2025 00:12

Merge branch 'master' into master
1957f9b

Update apriori_algorithm.py
3a561b3

algorithms-keeperbot added tests are failing Do not merge until tests pass and removed tests are failing Do not merge until tests pass labels Oct 19, 2025

MaximSmolskiy merged commit 154cd3e into TheAlgorithms:masterOct 19, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: optimizing the prune function at the apriori_algorithm.py archive#12992

feat: optimizing the prune function at the apriori_algorithm.py archive #12992

joaoneto9 commented Sep 24, 2025

Uh oh!

joaoneto9 commented Sep 24, 2025

Uh oh!

CopilotAI left a comment

Uh oh!

CopilotAIOct 1, 2025

Uh oh!

CopilotAIOct 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		tupla=tuple(item)
		iftuplanotinitemset_counteroritemset_counter[tupla] <length-1:

Uh oh!

feat: optimizing the prune function at the apriori_algorithm.py archive#12992

feat: optimizing the prune function at the apriori_algorithm.py archive #12992

Conversation

joaoneto9 commented Sep 24, 2025

Describe your change:

Checklist:

Uh oh!

joaoneto9 commented Sep 24, 2025

Uh oh!

CopilotAI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

CopilotAIOct 1, 2025

Choose a reason for hiding this comment

Uh oh!

CopilotAIOct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants