官方文档的简单学习
CodeQL queries
CodeQL queries 用于分析代码的安全性、正确性、可维护性以及可读性的相关问题。
The important types of query are:
- Alert queries: queries that highlight issues in specific locations in your code.
- Path queries: queries that describe the flow of information between a source and a sink in your code.
两种重要的查询类型:
- Alert queries: 用于突出代码中指定信息的查询。
- Path queries: 用于描述代码中从 『source』到『sink』流向的查询。
Basic query structure
/**
*
* Query metadata
*
*/
import /* ... CodeQL libraries or modules ... */
/* ... Optional, define CodeQL classes and predicates ... */
from /* ... variable declarations ... */
where /* ... logical formula ... */
select /* ... expressions ... */
Import statements
When writing your own alert queries, you would typically import the standard library for the language of the project that you are querying, using import
followed by a language:
当你编写alert queries时,通常需要导入项目语言所支持的标准库
- C/C++:
cpp
- C#:
csharp
- Go:
go
- Java:
java
- JavaScript/TypeScript:
javascript
- Python:
python
There are also libraries containing commonly used predicates, types, and other modules associated with different analyses, including data flow, control flow, and taint-tracking. In order to calculate path graphs, path queries require you to import a data flow library into the query file.
为了计算路径图,path queries 需要导入数据流库,以用于分析数据流、控制流以及污点跟踪。
From clause
from <type> <variable name>,...,...
声明查询中使用的变量。
Where clause
对from 中声明的变量加以逻辑条件。
Select clause
select
子句列举from
声明的满足where
条件的结果。
一个例子:查询1到10内的勾股数:
from int x, int y, int z
where x in [1..10] and y in [1..10] and z in [1..10] and
x*x + y*y = z*z
select x, y, z
Data flow graph
The CodeQL data flow libraries implement data flow analysis on a program or function by modeling its data flow graph. Unlike the abstract syntax tree, the data flow graph does not reflect the syntactic structure of the program, but models the way data flows through the program at runtime. Nodes in the abstract syntax tree represent syntactic elements such as statements or expressions. Nodes in the data flow graph, on the other hand, represent semantic elements that carry values at runtime.
通过建模DFG,CodeQL 数据流库实现了对程序或功能的数据分析。
不像AST、DFG不会反应程序的的句法结构,但是它是对程序运行时数据流动的模拟。
- AST中的节点表示语法元素,例如语句或者表达式。
- DFG中的节点表示运行时携带值的语义元素。
Some AST nodes (such as expressions) have corresponding data flow nodes, but others (such as if
statements) do not. This is because expressions are evaluated to a value at runtime, whereas if
statements are purely a control-flow construct and do not carry values. There are also data flow nodes that do not correspond to AST nodes at all.
一些AST节点(例如表达式)有相应的数据流节点,但是其他(例如
if
语句)不一样,这是因为表达式会在运行时对值进行评估,但if
语句是完全的控制流,并不携带值。当然也有一些数据流节点没有相应的AST节点。
- local data flow :函数内数据流
- global data flow:函数间数据流
path queries
Overview
Security researchers are particularly interested in the way that information flows in a program. Many vulnerabilities are caused by seemingly benign data flowing to unexpected locations, and being used in a malicious way. Path queries written with CodeQL are particularly useful for analyzing data flow as they can be used to track the path taken by a variable from its possible starting points (source
) to its possible end points (sink
). To model paths, your query must provide information about the source
and the sink
, as well as the data flow steps that link them.
安全研究员对信息在程序中的流动非常非常感兴趣。许多漏洞都是从看似良性的数据输入走向了不可预料的位置,然后被恶意使用。用CodeQL编写 Path queries 对数据流分析很有用,因为它们可以用来跟踪从起点
source
到终点sink
的所有可能路径。为了模拟路径,查询必须提供这两者的相关信息。以及连接它们的数据流步骤。
Constructing a path query
/**
* ...
* @kind path-problem
* ...
*/
import <language>
// For some languages (Java/C++/Python) you need to explicitly import the data flow library, such as
// import semmle.code.java.dataflow.DataFlow
import DataFlow::PathGraph
...
from MyConfiguration config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "<message>"
DataFlow::Pathgraph
is the path graph module you need to import from the standard CodeQL libraries.source
andsink
are nodes on the path graph, andDataFlow::PathNode
is their type.MyConfiguration
is a class containing the predicates which define how data may flow between thesource
and thesink
.
The following sections describe the main requirements for a valid path query.
DataFlow::Pathgraph
是需要从CodeQL库中导入的路径图模块。source
和sink
是路径图的节点,DataFlow::PathNode
是它们的属性。MyConfiguration
是包含谓词的类,这些谓词定义了数据如何在源和接收器之间流动。
PathGraph有内置查询谓词edges,用来判断ab之间是否有数据边
If you are querying C/C++, C#, Java, JavaScript, Python, or Ruby code (and you have used import DataFlow::PathGraph
in your query), the definitions of the source
and sink
are accessed via the Configuration
class in the data flow library. You should declare all three of these objects in the from
statement. For example:
如果你是查询C / C ++,C#,Java,JavaScript,Python或Ruby代码(并且在查询中使用的
Import Dataflow :: Pathgraph
),通过数据中的Configuration
类访问source
以及sink
。那么在from
语句中应该声明三个对象:
from Configuration config, DataFlow::PathNode source, DataFlow::PathNode sink
You can use the hasFlowPath
predicate to specify flow from the source
to the sink
for a given Configuration
:
可以使用
hasflowPath
谓词以指定从``Configuration中
source到
sink`的流动:
where config.hasFlowPath(source, sink)
A typical data-flow query looks like this:
class MyConfig extends TaintTracking::Configuration {
MyConfig() { this = "MyConfig" }
override predicate isSource(DataFlow::Node node) { node instanceof MySource }
override predicate isSink(DataFlow::Node node) { node instanceof MySink }
}
from MyConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "Sink is reached from $@.", source.getNode(), "here"
QL tutorials
谁是小偷
Take on the role of a detective to find the thief in this fictional village. You will learn how to use logical connectives, quantifiers, and aggregates in QL along the way.
承担侦探的角色在这个虚拟的村庄寻找小偷。你将会学习如何在QL中使用逻辑连接、量词、聚合。
- 如果我们使用临时变量,在
select
中不需要用到,可以使用exists
:
from Person t
where exists(string c | t.getHairColor() = c)
select t
一些聚合技巧:
Example | Result |
---|---|
min(Person p | p.getLocation() = "east" | p order by p.getHeight()) |
shortest person in the east of the village |
count(Person p | p.getLocation() = "south" | p) |
number of people in the south of the village |
avg(Person p | | p.getHeight()) |
average height of the villagers |
sum(Person p | p.getHairColor() = "brown" | p.getAge()) |
combined age of all the villagers with brown hair |
import tutorial
from Person t
where
/* 1 */ t.getHeight() > 150 and
/* 2 */ not t.getHairColor() = "blond" and
/* 3 */ exists (string c | t.getHairColor() = c) and
/* 4 */ not t.getAge() < 30 and
/* 5 */ t.getLocation() = "east" and
/* 6 */ (t.getHairColor() = "black" or t.getHairColor() = "brown") and
/* 7 */ not (t.getHeight() > 180 and t.getHeight() < 190) and
/* 8 */ exists(Person p | p.getAge() > t.getAge()) and
/* 9 */ not t = max(Person p | | p order by p.getHeight()) and
/* 10 */ t.getHeight() < avg(float i | exists(Person p | p.getHeight() = i) | i) and
/* 11 */ t = max(Person p | p.getLocation() = "east" | p order by p.getAge())
select "The thief is " + t + " !"
抓住纵火犯
- 罪犯是南方人
定义一个新的 谓词:
predicate isSouthern(Person p) {
p.getLocation() = "south"
}
...
from Person p
where isSouthern(p)
select p
然后定义子类属性加以约束:
class Southerner extends Person {
Southerner() { isSouthern(this) }
}
...
from Southerner s
select s
- 放火发生在北方,在发生盗窃后10岁以下的儿童不允许旅行。
重载谓词:
class Child extends Person{
Child(){
this.getAge()<10
}
override predicate isAllowedIn(string region) {
region=this.getLocation()
}
}
- 纵火犯是两个秃头
最终语句:
import tutorial
predicate isSouthern (Person p) {
p.getLocation()="south"
}
class Southerner extends Person{
Southerner(){
isSouthern(this)
}
}
class Child extends Person{
Child(){
this.getAge()<10
}
override predicate isAllowedIn(string region) {
region=this.getLocation()
}
}
predicate isBald(Person p) {
not exists (string c | p.getHairColor() = c)
}
from Southerner s
where s.isAllowedIn("north") and isBald(s)
select s
皇位继承人
QL递归用法:
// 返回p的孩子 result是默认的返回值
Person childOf(Person p) { parentOf(result) = p }
//返回p的所有祖先 存在迭代关系
Person ancestorOf(Person p){
result = parentOf(p) or
result = parentOf(childOf(p))
}
transitive closure:
parentOf()
多次应用相同的操作(在这种情况下),在 QL 中非常常见,被称为操作的传递闭包。有两个特殊符号在使用传递闭包+
时*
非常有用:
parentOf+(p)
将parentOf()
谓词应用于p
一次或多次。这相当于ancestorOf(p)
.parentOf*(p)
将parentOf()
谓词应用于p
零次或多次,因此它返回p
or的祖先p
。
relativeOf()
:
Person relativeOf(Person p) {
parentOf*(result) = parentOf*(p)
}
最后:
import tutorial
Person relativeOf(Person p) { parentOf*(result) = parentOf*(p) }
predicate hasCriminalRecord(Person p) {
p = "Hester" or
p = "Hugh" or
p = "Charlie"
}
from Person p
where
not p.isDeceased() and
p = relativeOf("King Basil") and
not hasCriminalRecord(p)
select p
过河
一个人正试图把一只山羊、一棵卷心菜和一头狼渡过河。他的船只能载自己,最多只能载一件货物。他的问题是,如果山羊和卷心菜单独在一起,它会吃掉它。如果狼和山羊单独在一起,它会吃掉它。他是怎么把所有东西都过河的?
一道模拟题 ,非常有意思,———> 地址
/**
* A solution to the river crossing puzzle.
*/
/** A possible cargo item. */
class Cargo extends string {
Cargo() {
this = "Nothing" or
this = "Goat" or
this = "Cabbage" or
this = "Wolf"
}
}
/** One of two shores. */
class Shore extends string {
Shore() {
this = "Left" or
this = "Right"
}
/** Returns the other shore. */
Shore other() {
this = "Left" and result = "Right"
or
this = "Right" and result = "Left"
}
}
/** Renders the state as a string. */
string renderState(Shore manShore, Shore goatShore, Shore cabbageShore, Shore wolfShore) {
result = manShore + "," + goatShore + "," + cabbageShore + "," + wolfShore
}
/** A record of where everything is. */
class State extends string {
Shore manShore;
Shore goatShore;
Shore cabbageShore;
Shore wolfShore;
State() { this = renderState(manShore, goatShore, cabbageShore, wolfShore) }
/** Returns the state that is reached after ferrying a particular cargo item. */
State ferry(Cargo cargo) {
cargo = "Nothing" and
result = renderState(manShore.other(), goatShore, cabbageShore, wolfShore)
or
cargo = "Goat" and
result = renderState(manShore.other(), goatShore.other(), cabbageShore, wolfShore)
or
cargo = "Cabbage" and
result = renderState(manShore.other(), goatShore, cabbageShore.other(), wolfShore)
or
cargo = "Wolf" and
result = renderState(manShore.other(), goatShore, cabbageShore, wolfShore.other())
}
/**
* Holds if the state is safe. This occurs when neither the goat nor the cabbage
* can get eaten.
*/
predicate isSafe() {
// The goat can't eat the cabbage.
(goatShore != cabbageShore or goatShore = manShore) and
// The wolf can't eat the goat.
(wolfShore != goatShore or wolfShore = manShore)
}
/** Returns the state that is reached after safely ferrying a cargo item. */
State safeFerry(Cargo cargo) { result = this.ferry(cargo) and result.isSafe() }
string towards() {
manShore = "Left" and result = "to the left"
or
manShore = "Right" and result = "to the right"
}
/**
* Returns all states that are reachable via safe ferrying.
* `path` keeps track of how it is achieved.
* `visitedStates` keeps track of previously visited states and is used to avoid loops.
*/
State reachesVia(string path, string visitedStates) {
// Reachable in 1 step by ferrying a specific cargo
exists(Cargo cargo |
result = this.safeFerry(cargo) and
visitedStates = result and
path = "First " + cargo + " is ferried " + result.towards()
)
or
// Reachable by first following pathSoFar and then ferrying cargo
exists(string pathSoFar, string visitedStatesSoFar, Cargo cargo |
result = this.reachesVia(pathSoFar, visitedStatesSoFar).safeFerry(cargo) and
not exists(int i | i = visitedStatesSoFar.indexOf(result)) and // resulting state is not visited yet
visitedStates = visitedStatesSoFar + "_" + result and
path = pathSoFar + ",\nthen " + cargo + " is ferried " + result.towards()
)
}
}
/** The initial state, where everything is on the left shore. */
class InitialState extends State {
InitialState() { this = renderState("Left", "Left", "Left", "Left") }
}
/** The goal state, where everything is on the right shore. */
class GoalState extends State {
GoalState() { this = renderState("Right", "Right", "Right", "Right") }
}
from string path
where any(InitialState i).reachesVia(path, _) = any(GoalState g)
select path + "."
CodeQL for Java
建立database:
codeql database create /Users/theoyu/workspace/cool/CodeQL/databases/micro-service-seclab-database -l java --command="mvn clean install --file pom.xml"
CodeQL library for Java
Summary of the library classes:
- 程序元素:例如class , method
- 描述AST节点:例如statement , expression
- 描述元数据metadata:例如 注解annotation 注释
- 计算指标的类: 例如 cyclomatic complexity and coupling 例如圈复杂度和耦合程度
- 用于导航程序的调用图的类 Classes for navigating the program’s call graph
程序元素Program element
为了描述class和method,QL提供了:包(Package)、编译单元(CompilationUnit)、类型(Type)、方法(Method)、构造器(Constructor)和变量(Variable)
Type:类Type有许多子类,用于表示不同种类的类型:
-
PrimitiveType :描述Java语言中的8个基础类型 boolean byte char double float int long short 外带null+void
-
RefType:引用类型(也就是非基础类型),拥有如下子类:
- Class : 类Class
- Interface:接口
- EnumType:枚举
- Array:数组
引用类型还可以按照其声明范围进行区分
- TopLevelType :表示在编译单元顶层声明的类型 (声明类型)
- NestedType:在一个type内部声明的type
- TopLevelClass:表示在编译单元顶层声明的类
- NestedType:在另一个type中声明的类
- LocalClass:在一个method或者constructor中定义的类
- AnonymousClass:匿名类
下面进行一些实践:
查找所有int类型的变量:
import java
from Variable v, PrimitiveType pt
where pt=v.getType() and
pt.hasName("boolean")
select v
查询所有的声明类型和编译类型不同的类型:
import java
from TopLevelType tl
where tl.getName() != tl.getCompilationUnit().getName()
select tl
变量
- Filed 代表Java类属性
- LocalVariableDecl 代表局部变量
- Parameter 代表方法(method, constructor)参数
AST
AST的节点主要包含语句 Stmt
和表达式 Expr
Expr.getAChildExpr
返回给定表达式的子表达式。Stmt.getAChild
返回直接嵌套在给定语句中的语句或表达式。Expr.getParent
和Stmt.getParent
返回 一个AST 节点的父节点。
返回所有 return中的表达式:
import java
from Expr e
where e.getParent() instanceof ReturnStmt
select e
返回所有if 中的语句
import java
from Stmt s
where s.getParent() instanceof IfStmt
select s
元数据
CodeQL提供Annotatable
类,作为所有可以被添加注解的程序元素的父类。例如:package、引用类型、field、method、constructor、local variable declaration。
Annotatable
类的谓词getAnAnnotation
可以返回程序元素被添加的注解信息。
Analyzing data flow in Java
Local data flow
本地数据流是方法内的数据流,CodeQL为本地数据流分析提供的模块是DataFlow
。
DataFlow
模块定义了Node
类,表示数据可以流经的类。Nodes
分为表达式节点(ExprNode
)和参数节点(ParameterNode
)
您可以使用成员谓词asExpr
和asParameter
在数据流节点和表达式/参数之间进行映射。