Skip to content

Conversation

@d10c
Copy link
Contributor

@d10cd10c commented Sep 1, 2025

This PR adds overlay support to the Python extractor, including overlay compilation, basic tests, and a consistency check.

Supercedes earlier PR #20206.

According to latest DCA results,

  • Analysis time +15%
  • Database build time -72%
  • TRAP import time -46%
  • End-to-end time -15%
  • Accuracy 99.6% (lowest on py/unused-global-variable 83%)
  • Database size +20%

Clarifications:

  • I squashed all new changes onto earlier PR.
  • @py_cobject and @externalDataElement are not Discardable because they can't be linked to a source file. The consistency check ignores them.
  • @externalDefect/Metric, @duplication_or_similarity, and @svnentry are not Discardable because they are deprecated.

@d10cd10cforce-pushed the d10c/python-overlay-compilation-plus-extractor branch 2 times, most recently from 0b94992 to feb4c3aCompareSeptember 12, 2025 21:20
@d10cd10cforce-pushed the d10c/python-overlay-compilation-plus-extractor branch 3 times, most recently from fbb16b4 to e2f6e4aCompareSeptember 22, 2025 21:15
github-advanced-security[bot]

This comment was marked as resolved.

@d10cd10cforce-pushed the d10c/python-overlay-compilation-plus-extractor branch from 456c659 to c0707fdCompareOctober 2, 2025 15:50
@d10cd10cforce-pushed the d10c/python-overlay-compilation-plus-extractor branch 2 times, most recently from 3901c56 to 8844c2dCompareOctober 2, 2025 16:16
@d10cd10c mentioned this pull request Oct 2, 2025
4 tasks
@d10cd10c requested a review from tausbnOctober 2, 2025 16:18
@d10cd10c marked this pull request as ready for review October 2, 2025 16:18
@d10cd10c requested a review from a team as a code ownerOctober 2, 2025 16:18
CopilotAI review requested due to automatic review settings October 2, 2025 16:18
Copy link
Contributor

CopilotAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds overlay support to the Python extractor, enabling incremental compilation and overlay-based extraction for improved performance. The changes introduce overlay metadata handling, entity discard predicates, and consistency checks for overlay databases.

  • Adds overlay compilation and extraction support to the Python ecosystem
  • Implements entity discard predicates for incremental analysis
  • Introduces consistency checks to ensure proper overlay database construction

Reviewed Changes

Copilot reviewed 45 out of 45 changed files in this pull request and generated no comments.

Show a summary per file
FileDescription
python/ql/lib/semmlecode.python.dbschemeAdds @top type and overlay metadata support to database schema
python/ql/lib/semmle/python/Overlay.qllImplements comprehensive entity discard predicates for overlay functionality
python/ql/lib/semmle/python/internal/OverlayDiscardConsistencyQuery.qllProvides consistency query logic for overlay database validation
python/extractor/semmle/worker.pyAdds overlay extraction mode with change-based file filtering
python/extractor/semmle/projectlayout.pyImproves Windows path handling in project layout configuration
python/extractor/semmle/path_rename.pyUpdates environment variable to use CODEQL_PATH_TRANSFORMER
python/ql/test/extractor-tests/overlay/Adds comprehensive overlay extraction test cases
Various .expected filesTest output files for overlay functionality validation

d10c added 12 commits October 6, 2025 11:36
The new name is required by overlay support.
And don't add slash to start of path patterns on Windows.
- fall back to full extraction on overlay changes json read error - we filter both root modules and (transitive) imports against the overlay-changes json.
for dbscheme elements with direct or indirect location links in dbscheme. - Unify discardable entities under one Discardable superclass. - Two discard predicates depending on TRAP ID type. - Future-proof the XML and Yaml discard predicates for when their extractors become incremental.
@d10cd10cforce-pushed the d10c/python-overlay-compilation-plus-extractor branch from 8844c2d to e74f9a4CompareOctober 6, 2025 09:51
d10c added 2 commits October 6, 2025 12:30
The base source is in basic-overlay-eval/orig_src, the overlay source is in basic-full-eval. We run two tests: a full evaluation test in basic-full-eval, and an overlay evaluation test in basic-overlay-eval. The test source and expected results are the SAME, due to the .qlref, meaning we expect the same results for full and overlay evaluation.
@d10cd10cforce-pushed the d10c/python-overlay-compilation-plus-extractor branch from e74f9a4 to ece1210CompareOctober 6, 2025 10:31
Copy link
Contributor

@tausbntausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question (non-blocking), otherwise this looks good to me. 👍

@d10cd10c merged commit e120e5c into github:mainOct 16, 2025
20 checks passed
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

@d10c@tausbn