- Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Description
I have identified a false negative in Python DataFlow analysis where taint tracking is lost when a class is defined inside a function.
If a tainted variable is passed as an argument to a function, and that argument is subsequently used inside a class defined within that function (a function-local class), CodeQL fails to track the data flow to the class instance's attributes.
However, if a similar logic is applied using a top-level (module-level) class, the data flow is detected correctly. This suggests an issue with how data flow is handled across the scope boundary of locally defined classes.
Reproduction Case (False Negative)
In this example, taint_src is passed to constructor_field_001_T. The class A is defined inside the function and captures taint_src. The flow to os.system is NOT detected.
importosdefconstructor_field_001_T(taint_src): # Class defined inside the function scopeclassA: def__init__(self): # ISSUE: The analyzer fails to track 'taint_src' from the # outer function argument into this local class scope.self.data=taint_srcself.sani='_'obj=A() taint_sink(obj.data) deftaint_sink(o): os.system(o) if__name__=="__main__": taint_src="taint_src_value"constructor_field_001_T(taint_src)Control Case (Working)
In this example, the class A is defined at the module level. The flow to os.systemIS detected correctly.
importos# Class defined at module levelclassA: def__init__(self): # Accessing taint_src (as a global/captured in this context) works fineself.data=taint_srcself.sani='_'defconstructor_field_001_T(taint_src): obj=A() taint_sink(obj.data) deftaint_sink(o): os.system(o) if__name__=="__main__": taint_src="taint_src_value"constructor_field_001_T(taint_src)Additional Control Case (Working: Explicit Argument)
Significantly, if I keep the class inside the function but pass taint_src as an explicit argument to __init__, the flow IS detected.
importosdefconstructor_field_explicit_arg(taint_src): # Class defined inside functionclassA: # Explicit argument instead of capturedef__init__(self, val): self.data=val# Passing taint explicitlyobj=A(taint_src) taint_sink(obj.data) deftaint_sink(o): os.system(o) if__name__=="__main__": taint_src="taint_src_value"constructor_field_explicit_arg(taint_src)CodeQL Query Used
I am using a standard DataFlow::Global configuration looking for the specific string literal flowing to os.system.
Click to view query
/** * @name Python Taint Reproduction * @kind path-problem * @problem.severity error * @id py/taint-reproduction */import python import semmle.python.dataflow.new.DataFlow import semmle.python.dataflow.new.TaintTracking classTaintSourceextends DataFlow::Node{TaintSource(){exists(StrConststr|str.getText()="taint_src_value"andthis.asExpr()=str)}}classDangerousSinkextends DataFlow::Node{DangerousSink(){exists(Callcall|(call.getFunc().(Attribute).getName()="system"andcall.getFunc().(Attribute).getObject().(Name).getId()="os")andthis.asExpr()=call.getAnArg())}}module TaintConfig implements DataFlow::ConfigSig{predicateisSource(DataFlow::Nodesource){sourceinstanceofTaintSource}predicateisSink(DataFlow::Nodesink){sinkinstanceofDangerousSink}}module TaintFlow = TaintTracking::Global<TaintConfig>; import TaintFlow::PathGraph from TaintFlow::PathNodesource, TaintFlow::PathNodesinkwhere TaintFlow::flowPath(source,sink)selectsink.getNode(),source,sink,"Taint flow detected"Expected Behavior
CodeQL should be able to track the taint_src argument into the __init__ method of the locally defined class A, eventually leading to the os.system sink, just as it does for top-level classes.