Skip to content

Implement Memory1 (RULE-8-7-1)#967

Open
jeongsoolee09 wants to merge 77 commits intomainfrom
jeongsoolee09/MISRA-C++-2023-Memory
Open

Implement Memory1 (RULE-8-7-1)#967
jeongsoolee09 wants to merge 77 commits intomainfrom
jeongsoolee09/MISRA-C++-2023-Memory

Conversation

@jeongsoolee09
Copy link
Copy Markdown
Collaborator

@jeongsoolee09 jeongsoolee09 commented Nov 18, 2025

Description

Implement Memory1 (RULE-8-7-1).

Change request type

  • Release or process automation (GitHub workflows, internal scripts)
  • Internal documentation
  • External documentation
  • Query files (.ql, .qll, .qls or unit tests)
  • External scripts (analysis report or other code shipped as part of a release)

Rules with added or modified queries

  • No rules added
  • Queries have been added for the following rules:
    • RULE-8-7-1
  • Queries have been modified for the following rules:
    • rule number here

Release change checklist

A change note (development_handbook.md#change-notes) is required for any pull request which modifies:

  • The structure or layout of the release artifacts.
  • The evaluation performance (memory, execution time) of an existing query.
  • The results of an existing query in any circumstance.

If you are only adding new rule queries, a change note is not required.

Author: Is a change note required?

  • Yes
  • No

🚨🚨🚨
Reviewer: Confirm that format of shared queries (not the .qll file, the
.ql file that imports it) is valid by running them within VS Code.

  • Confirmed

Reviewer: Confirm that either a change note is not required or the change note is required and has been added.

  • Confirmed

Query development review checklist

For PRs that add new queries or modify existing queries, the following checklist should be completed by both the author and reviewer:

Author

  • Have all the relevant rule package description files been checked in?
  • Have you verified that the metadata properties of each new query is set appropriately?
  • Do all the unit tests contain both "COMPLIANT" and "NON_COMPLIANT" cases?
  • Are the alert messages properly formatted and consistent with the style guide?
  • Have you run the queries on OpenPilot and verified that the performance and results are acceptable?
    As a rule of thumb, predicates specific to the query should take no more than 1 minute, and for simple queries be under 10 seconds. If this is not the case, this should be highlighted and agreed in the code review process.
  • Does the query have an appropriate level of in-query comments/documentation?
  • Have you considered/identified possible edge cases?
  • Does the query not reinvent features in the standard library?
  • Can the query be simplified further (not golfed!)

Reviewer

  • Have all the relevant rule package description files been checked in?
  • Have you verified that the metadata properties of each new query is set appropriately?
  • Do all the unit tests contain both "COMPLIANT" and "NON_COMPLIANT" cases?
  • Are the alert messages properly formatted and consistent with the style guide?
  • Have you run the queries on OpenPilot and verified that the performance and results are acceptable?
    As a rule of thumb, predicates specific to the query should take no more than 1 minute, and for simple queries be under 10 seconds. If this is not the case, this should be highlighted and agreed in the code review process.
  • Does the query have an appropriate level of in-query comments/documentation?
  • Have you considered/identified possible edge cases?
  • Does the query not reinvent features in the standard library?
  • Can the query be simplified further (not golfed!)

@jeongsoolee09 jeongsoolee09 self-assigned this Nov 18, 2025
…terminator, remove file pointer cases

1. Add headers, Adding missing headers: For obvious reasons.
2. Remove cases without null terminator: Both clang and g++ do not permit
   strings to be allocated that are declared to be shorter than the actual
   initializing expression. Since this is a C++ rule, we rule them out.
3. File pointer manipulation functions (e.g. fgets): Not required by the rule.
Copy link
Copy Markdown
Collaborator

@MichaelRFairhurst MichaelRFairhurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So close to ready!

CallocFunctionCall() { this.isCallocCall() }

override int getMinNumBytes() {
result = lowerBound(this.getArgument(0)) * lowerBound(this.getArgument(1))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should file a bug to come back to this.

In theory, it would be great to have two versions of the query: one where we know with certainty that the resulting pointer is out of bounds if flow analysis is correct -- we assume the maximum allocation size and the smallest pointer offsets. Then another where we suspect a possible invalid pointer, where we assume the minimum allocation size and the largest pointer offsets. These could share most behavior and would have different precisions.

In the meantime, lets ship!


newtype TArrayAllocation =
TStackAllocation(ArrayDeclaration arrayDecl) or
TDynamicAllocation(NarrowedHeapAllocationFunctionCall narrowedAlloc)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's file a bug to come back to the third kind of "allocation," which is just taking the address of a non-array variable or lvalue.

int x = 0;
int *p = &x; // p is essentially a buffer of size 1

Partly I say let's come back because we would need to be careful to distinguish:

int x = 0;
int arr[5] = {0};

int *p1 = &x; // generally, taking an address to anything should be a buffer of size 1
int *p2 = &arr[0]; // except this

// Note that any lvalue expression can create a "buffer" of size 1, not just variables:
int &f() { return x; }
int *p3 = &f(); // also a "buffer" of size 1
int *p4 = &*p3; // also a "buffer" of size 1

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented to some degree in 44ef266.

*/
int getOffset() {
if this.asPointerArithmetic() instanceof PointerSubExpr
then result = -this.getOffsetExpr().getValue().toInt()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing to file is that this currently only works on constant values, but in the future we could extend this to use range analysis.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Introducing range analysis should be careful, otherwise it might generate a lot of noise. This is out of scope of this PR and should be reserved for later.

sink.getNode() = end.getBasePointerNode()
|
srcOffset = start.getOffset() and
sinkOffset = end.getOffset() and
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overwrites the previous offset, but they should add up.

For example:

int arr[5];
int *p = arr;
int p1 = p + 3; // offset: 3, length: 5
int p2 = p1 + 2; // offset: 5, length: 5

Currently, this will produce sinkOffset = 2 for the last line

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in 394b7ad and documented in 2e4ace6.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Only thought now, do still need srcOffset, sinkOffset to be in the table / to be predicate parameters?

In the base case, the srcOffset and sinkOffset come from start and end, not from srcSinkLengthMap.

In the recursive case, the srcOffset from the previous iteration is unused (srcSinkLengthMap(_, start, /*here -> */ _, ...). The sinkOffset from the previous iteration is only bound to be the new srcOffset, which we just determined wasn't used in the next iteration.

srcOffset and sinkOffset are then also not used by the select

@jeongsoolee09
Copy link
Copy Markdown
Collaborator Author

Two things to note about the multidimensional arrays:

  1. If a row access and the row element access are apart from each other with a function boundary in between, the query loses indirection information. We probably need to add a level column to simulate push-pop behavior: taking an indirection edge during initialization pushes and accesses pops them.
  2. Currently, the relationship between FatPointer and DataFlow::Node is one-to-many, if we don't-care the level column. The multidimensional array accesses have duplicate alerts and this is probably what is causing it.

Copy link
Copy Markdown
Collaborator

@MichaelRFairhurst MichaelRFairhurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

literally just minor tweaks!! This looks 🔥 great!

/**
* This module provides classes and predicates for analyzing the size of buffers
* or objects from their base or a byte-offset, and identifying the potential for
* expressions accessing those buffers to overflow.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, can we have c/common/src/codingatandards/c/OutOfBounds.qll import cpp.OutOfBounds ? A simple wrapper import file would be a reasonable easy refactor.

(or find each c query that imports OutOfBounds and update those)?

I think it's probably reasonable

* @precision medium
* @problem.severity error
* @tags external/misra/id/rule-8-7-1
* scope/system
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we're missing correctness

/**
* Gets the declared length of this array.
*/
int getLength() { result = length }
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the int length; on line 26 can be deleted now, right?

sink.getNode() = end.getBasePointerNode()
|
srcOffset = start.getOffset() and
sinkOffset = end.getOffset() and
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Only thought now, do still need srcOffset, sinkOffset to be in the table / to be predicate parameters?

In the base case, the srcOffset and sinkOffset come from start and end, not from srcSinkLengthMap.

In the recursive case, the srcOffset from the previous iteration is unused (srcSinkLengthMap(_, start, /*here -> */ _, ...). The sinkOffset from the previous iteration is only bound to be the new srcOffset, which we just determined wasn't used in the next iteration.

srcOffset and sinkOffset are then also not used by the select

array +
4; // NON_COMPLIANT: pointer points more than one beyond the last element
int *invalid2 =
array - 1; // NON_COMPLIANT: pointer is outside boundary [FALSE_NEGATIVE]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer a false negative! 🎉

strcat(buf1, " "); // NON_COMPLIANT - not null terminated
strcat(buf2, " "); // COMPLIANT
strcat(buf3, " "); // COMPLIANT
strcat(buf4, "12345"); // NON_COMPLIANT
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is actually a FN

"description": "Pointers obtained as result of performing arithmetic should point to an initialized object, or an element right next to the last element of an array.",
"kind": "path-problem",
"name": "Pointer arithmetic shall not form an invalid pointer",
"precision": "medium",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be very tempted to say "high" precision.

Nicely done :)

"severity": "error",
"short_name": "PointerArithmeticFormsAnInvalidPointer",
"tags": [
"scope/system"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tags should probably include correctness and security, for this and below

"tags": [
"scope/system"
]
},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding an implementation scope that we only handle constant offsets for increased precision

)
select end, src, sink,
"This pointer accesses element at index " + totalOffset +
" while the underlying object has length " + length + "."
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small thought on verbiage.

While "Object" is the right term here, generally speaking, I'm not sure devs think of objects as something that has a "length"... I'd say objects have a "size," and that size is in bytes, making "length" perhaps doubly confusing.

If you want to go with "length," I'd probably suggest "array" or "buffer." Though it could be confusing to say that &x is either.

Maybe something like, "Pointer formed that points to element X of an object contains Y elements" ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants