- Notifications
You must be signed in to change notification settings - Fork 70
Implement naming package, new IdentifierIntroduction.qll, unicode funcs.#950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base:main
Are you sure you want to change the base?
Implement naming package, new IdentifierIntroduction.qll, unicode funcs. #950
Uh oh!
There was an error while loading. Please reload this page.
Conversation
MichaelRFairhurst commented Aug 23, 2025 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
MichaelRFairhurst commented Aug 23, 2025
Note that the unicode data came from advanced-security/codeql-qtil#13 I should definitely finish unicode support in qtil, publish, and then use that here. Likely, that should be done before merge, but not strictly necessary. |
MichaelRFairhurst commented Aug 24, 2025
Relevant qtil pull request: advanced-security/codeql-qtil#13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a comprehensive naming validation package for MISRA C++ RULE-5-10-1, which enforces proper identifier formation in C++ code. The implementation introduces a sophisticated identifier tracking system that validates identifiers against multiple constraints including Unicode normalization, reserved names, namespace restrictions, and macro naming conventions.
Key changes:
- Introduces the
IdentifierIntroductionabstraction that systematically captures all identifier declarations across various C++ constructs (variables, functions, types, macros, namespaces, templates, etc.) - Implements Unicode support with UAX#44 compliance checking and NFC normalization validation using extensible predicates with external YAML data
- Adds MISRA C++ RULE-5-10-1 query to detect poorly formed identifiers including underscore violations, lowercase in macros, reserved names, and reserved namespace usage
Reviewed changes
Copilot reviewed 17 out of 18 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| cpp/common/src/codingstandards/cpp/Identifiers.qll | Introduces comprehensive IdentifierIntroduction class hierarchy that systematically tracks all identifier declarations across various C++ constructs |
| cpp/common/src/codingstandards/cpp/Unicode.qll | Implements Unicode property checking (NFC_QC, XID_Start, XID_Continue) and unicode escape sequence handling for identifier validation |
| cpp/common/src/codingstandards/cpp/Macro.qll | Fixes variadic macro parameter extraction to properly exclude ellipsis and empty parameter names |
| cpp/misra/src/rules/RULE-5-10-1/PoorlyFormedIdentifier.ql | Implements the main query that validates identifiers against MISRA C++ RULE-5-10-1 constraints |
| cpp/common/src/codingstandards/cpp/exclusions/cpp/Naming2.qll | Autogenerated metadata for Naming2 package query registration |
| cpp/common/src/codingstandards/cpp/exclusions/cpp/RuleMetadata.qll | Registers Naming2 package in the rule metadata system |
| rule_packages/cpp/Naming2.json | Defines query metadata for RULE-5-10-1 including severity, precision, and tags |
| cpp/misra/test/rules/RULE-5-10-1/test.cpp | Comprehensive test file with 189 lines covering Unicode, normalization, underscores, macros, namespaces, and reserved names |
| cpp/misra/test/rules/RULE-5-10-1/PoorlyFormedIdentifier.expected | Expected query results showing 48 violations across various identifier validation rules |
| cpp/misra/test/rules/RULE-5-10-1/PoorlyFormedIdentifier.qlref | Query reference file for test execution |
| cpp/common/test/library/codingstandards/cpp/identifiers/* | Library test suite with 666 lines testing identifier extraction across all C++ constructs |
| cpp/common/test/includes/standard-library/utility.h | Adds pair and tuple support for structured binding tests |
| cpp/common/src/qlpack.yml | Registers unicode.yml data extension |
| change_notes/2025-08-22-function-like-macro-param-name-bug-fixes.md | Documents bug fixes in function-like macro parameter handling |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| /** | ||
| * An identifier introduced as a template function name or as a parameter of a function-like macro. |
CopilotAIDec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class documentation incorrectly describes this as "An identifier introduced as a template function name or as a parameter of a function-like macro." However, the implementation shows this class handles Macro identifiers (macro names and their parameters), not template functions. The documentation should be corrected to accurately describe that this class handles identifiers introduced by macros (both the macro name itself and any parameters of function-like macros).
| *Anidentifier introduced as a template function name oras a parameter of a function-like macro. | |
| *Anidentifier introduced by a macro, including both the macro name itself andany parameters of function-like macros. |
| exists(Functionfunc|func=intro.getElement().(FunctionDeclarationEntry).getFunction()| | ||
| isUserDefinedLiteralSuffixNonCompliant(func)and | ||
| message="User-defined literal suffix '"+ident+"' is malformed." | ||
| ) |
CopilotAIDec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition appears unreachable. The query checks if the element is a FunctionDeclarationEntry with a Function that has a malformed user-defined literal suffix, and then tries to use 'ident' in the message. However, for user-defined literal suffixes, the identifier extracted on line 53 via 'intro.unescapeUnicode()' will be the suffix without the 'operator ""' prefix (e.g., '_foo'), not the full function name. This means this branch would never match the conditions in 'isUserDefinedLiteralSuffixNonCompliant' which checks for patterns in the full function name like 'operator""%'. This clause should either be removed as unreachable or the logic should be corrected to properly handle this case.
| exists(Functionfunc| func =intro.getElement().(FunctionDeclarationEntry).getFunction()| | |
| isUserDefinedLiteralSuffixNonCompliant(func)and | |
| message="User-defined literal suffix '"+ ident +"' is malformed." | |
| ) |
| /** | ||
| * Provides properties of a Unicode code point, where the property is of 'enumeration', 'catalog', | ||
| * or 'string-valued' type, however, the only supported property is `NFC_QC`. | ||
| * | ||
| * For example, `Block` is an enumeration property, `Line_Break` is a catalog property, and | ||
| * `Uppercase_Mapping` is a string-valued property. | ||
| * | ||
| * For boolean properties, see `unicodeHasBooleanProperty`, and for numeric properties, see | ||
| * `unicodeHasNumericProperty`. | ||
| */ | ||
| extensiblepredicateunicodeHasProperty(intcodePoint,stringpropertyName,stringpropertyValue); |
CopilotAIDec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation states that this predicate provides properties of type 'enumeration', 'catalog', or 'string-valued', but then says "however, the only supported property is NFC_QC". This is confusing because it first suggests broad support and then limits it. Consider rephrasing to be more direct, such as: "Provides the NFC_QC property value for a Unicode code point. This is the only Unicode property currently supported."
| * This has to be treated specially. The member predicate `getName()` on a `FriendDecl` returns the | ||
| * string "foo's friend", which is not an identifier in the program. | ||
| * | ||
| * The elements returned by the `getFriend()` member predicate often do not have a correspending |
CopilotAIDec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: 'correspending' should be 'corresponding'.
| /** | ||
| * @id cpp/misra/poorly-formed-identifier | ||
| * @name RULE-5-10-1: User-defined identifiers shall have an appropriate form | ||
| * @description Identifiers shall not conflict with keywords, reserved name, or otherwise be poorly |
CopilotAIDec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar issue: 'reserved name' should be plural 'reserved names' to match the pattern of listing multiple items.
| * @description Identifiersshall not conflict with keywords, reserved name,or otherwise be poorly | |
| * @description Identifiersshall not conflict with keywords, reserved names,or otherwise be poorly |
| } | ||
| bindingset[s] | ||
| predicatehasDoubleUnderscore(strings){s.matches("%\\_\\_%")} |
CopilotAIDec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern used in this predicate uses backslash escaping that may be unclear. The regex pattern "\\%" is matching two consecutive underscores anywhere in the string. Consider using a more readable pattern like ".__." or adding a comment explaining that this matches identifiers containing double underscores anywhere in the string.
| predicatehasDoubleUnderscore(strings){s.matches("%\\_\\_%")} | |
| /** Matches identifiers containing two consecutive underscores anywhere in the string. */ | |
| predicatehasDoubleUnderscore(strings){s.regexpMatch(".*__.*")} |
| or | ||
| intro.isFromMacro()and | ||
| notident.regexpMatch("^[a-zA-Z0-9_]+$")and | ||
| message="Identifier '"+ident+"' contains invalid characters. " |
CopilotAIDec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message ends with an unnecessary trailing space: "contains invalid characters. " (note the space before the closing quote). This should be removed for consistency with other error messages in this query.
| message="Identifier '"+ ident +"' contains invalid characters." | |
| message="Identifier '"+ ident +"' contains invalid characters." |
| #definemacro_ALL_CAPS49// NON_COMPLIANT - starts with lowercase | ||
| #defineMACRO$DOLLAR 54// NON_COMPLIANT - contains dollar sign | ||
| #defineFUNCTION_LIKE_MACRO(x) \ | ||
| ((x) + 1) // NON_COMPLIANT - lower case argument name |
CopilotAIDec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent comment: the comment says "NON_COMPLIANT - lower case argument name" but the actual issue is that the macro parameter 'x' violates the rule requiring macros to use only uppercase characters. The comment should more accurately describe the violation.
| ((x) + 1) // NON_COMPLIANT - lower case argument name | |
| ((x) + 1) // NON_COMPLIANT - macro parameter 'x' is not uppercase |
| #defineFUNCTION_LIKE_MACRO(x) \ | ||
| ((x) + 1) // NON_COMPLIANT - lower case argument name | ||
| #defineFUNCTION_LIKE_MACRO2(X) \ | ||
| ((X) + 1) // NON_COMPLIANT - lower case argument name |
CopilotAIDec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent comment: the comment says "NON_COMPLIANT - lower case argument name" but line 102 shows the parameter is 'X' which is uppercase. Based on the expected results, this line is actually compliant (no error is expected for it). The comment should be corrected or removed.
| ((X) + 1) //NON_COMPLIANT - lower case argument name | |
| ((X) + 1) //COMPLIANT |
| dinstanceofClassTemplateSpecialization | ||
| } | ||
| privatenewtypeTIndentifierIntroduction= |
CopilotAIDec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in the type name: 'TIndentifierIntroduction' should be 'TIdentifierIntroduction' (missing 'i' after 'd').
Description
Implement naming package.
Change request type
.ql,.qll,.qlsor unit tests)Rules with added or modified queries
RULE 5-10-1Release change checklist
A change note (development_handbook.md#change-notes) is required for any pull request which modifies:
If you are only adding new rule queries, a change note is not required.
Author: Is a change note required?
🚨🚨🚨
Reviewer: Confirm that format of shared queries (not the .qll file, the
.ql file that imports it) is valid by running them within VS Code.
Reviewer: Confirm that either a change note is not required or the change note is required and has been added.
Query development review checklist
For PRs that add new queries or modify existing queries, the following checklist should be completed by both the author and reviewer:
Author
As a rule of thumb, predicates specific to the query should take no more than 1 minute, and for simple queries be under 10 seconds. If this is not the case, this should be highlighted and agreed in the code review process.
Reviewer
As a rule of thumb, predicates specific to the query should take no more than 1 minute, and for simple queries be under 10 seconds. If this is not the case, this should be highlighted and agreed in the code review process.