Learn CODEQL

官方文档的简单学习

CodeQL queries

CodeQL queries 用于分析代码的安全性、正确性、可维护性以及可读性的相关问题。

The important types of query are:

  • Alert queries: queries that highlight issues in specific locations in your code.
  • Path queries: queries that describe the flow of information between a source and a sink in your code.

两种重要的查询类型:

  • Alert queries: 用于突出代码中指定信息的查询。
  • Path queries: 用于描述代码中从 『source』到『sink』流向的查询。

Basic query structure

/**
 *
 * Query metadata
 *
 */

import /* ... CodeQL libraries or modules ... */

/* ... Optional, define CodeQL classes and predicates ... */

from /* ... variable declarations ... */
where /* ... logical formula ... */
select /* ... expressions ... */

Import statements

When writing your own alert queries, you would typically import the standard library for the language of the project that you are querying, using import followed by a language:

当你编写alert queries时,通常需要导入项目语言所支持的标准库

  • C/C++: cpp
  • C#: csharp
  • Go: go
  • Java: java
  • JavaScript/TypeScript: javascript
  • Python: python

There are also libraries containing commonly used predicates, types, and other modules associated with different analyses, including data flow, control flow, and taint-tracking. In order to calculate path graphs, path queries require you to import a data flow library into the query file.

为了计算路径图,path queries 需要导入数据流库,以用于分析数据流、控制流以及污点跟踪。

From clause

from <type> <variable name>,...,...

声明查询中使用的变量。

Where clause

对from 中声明的变量加以逻辑条件。

Select clause

select子句列举from声明的满足where条件的结果。

一个例子:查询1到10内的勾股数:

from int x, int y, int z
where x in [1..10] and y in [1..10] and z in [1..10] and
      x*x + y*y = z*z
select x, y, z

Data flow graph

The CodeQL data flow libraries implement data flow analysis on a program or function by modeling its data flow graph. Unlike the abstract syntax tree, the data flow graph does not reflect the syntactic structure of the program, but models the way data flows through the program at runtime. Nodes in the abstract syntax tree represent syntactic elements such as statements or expressions. Nodes in the data flow graph, on the other hand, represent semantic elements that carry values at runtime.

通过建模DFG,CodeQL 数据流库实现了对程序或功能的数据分析。

不像AST、DFG不会反应程序的的句法结构,但是它是对程序运行时数据流动的模拟。

  • AST中的节点表示语法元素,例如语句或者表达式。
  • DFG中的节点表示运行时携带值的语义元素

Some AST nodes (such as expressions) have corresponding data flow nodes, but others (such as if statements) do not. This is because expressions are evaluated to a value at runtime, whereas if statements are purely a control-flow construct and do not carry values. There are also data flow nodes that do not correspond to AST nodes at all.

一些AST节点(例如表达式)有相应的数据流节点,但是其他(例如if语句)不一样,这是因为表达式会在运行时对值进行评估,但if语句是完全的控制流,并不携带值。当然也有一些数据流节点没有相应的AST节点。

  • local data flow :函数内数据流
  • global data flow:函数间数据流

path queries

Overview

Security researchers are particularly interested in the way that information flows in a program. Many vulnerabilities are caused by seemingly benign data flowing to unexpected locations, and being used in a malicious way. Path queries written with CodeQL are particularly useful for analyzing data flow as they can be used to track the path taken by a variable from its possible starting points (source) to its possible end points (sink). To model paths, your query must provide information about the source and the sink, as well as the data flow steps that link them.

安全研究员对信息在程序中的流动非常非常感兴趣。许多漏洞都是从看似良性的数据输入走向了不可预料的位置,然后被恶意使用。用CodeQL编写 Path queries 对数据流分析很有用,因为它们可以用来跟踪从起点 source 到终点sink的所有可能路径。为了模拟路径,查询必须提供这两者的相关信息。以及连接它们的数据流步骤。

Constructing a path query

/**
 * ...
 * @kind path-problem
 * ...
 */

import <language>
// For some languages (Java/C++/Python) you need to explicitly import the data flow library, such as
// import semmle.code.java.dataflow.DataFlow
import DataFlow::PathGraph
...

from MyConfiguration config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "<message>"
  • DataFlow::Pathgraph is the path graph module you need to import from the standard CodeQL libraries.
  • source and sink are nodes on the path graph, and DataFlow::PathNode is their type.
  • MyConfiguration is a class containing the predicates which define how data may flow between the source and the sink.

The following sections describe the main requirements for a valid path query.

  • DataFlow::Pathgraph 是需要从CodeQL库中导入的路径图模块。
  • sourcesink 是路径图的节点, DataFlow::PathNode 是它们的属性。
  • MyConfiguration 是包含谓词的类,这些谓词定义了数据如何在源和接收器之间流动。

PathGraph有内置查询谓词edges,用来判断ab之间是否有数据边

image-20220310184238673

If you are querying C/C++, C#, Java, JavaScript, Python, or Ruby code (and you have used import DataFlow::PathGraph in your query), the definitions of the source and sink are accessed via the Configuration class in the data flow library. You should declare all three of these objects in the from statement. For example:

如果你是查询C / C ++,C#,Java,JavaScript,Python或Ruby代码(并且在查询中使用的Import Dataflow :: Pathgraph),通过数据中的Configuration 类访问source以及sink。那么在from语句中应该声明三个对象:

from Configuration config, DataFlow::PathNode source, DataFlow::PathNode sink

You can use the hasFlowPath predicate to specify flow from the source to the sink for a given Configuration:

可以使用hasflowPath谓词以指定从``Configurationsourcesink`的流动:

where config.hasFlowPath(source, sink)

A typical data-flow query looks like this:

class MyConfig extends TaintTracking::Configuration {
  MyConfig() { this = "MyConfig" }

  override predicate isSource(DataFlow::Node node) { node instanceof MySource }

  override predicate isSink(DataFlow::Node node) { node instanceof MySink }
}

from MyConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "Sink is reached from $@.", source.getNode(), "here"

QL tutorials

谁是小偷

Take on the role of a detective to find the thief in this fictional village. You will learn how to use logical connectives, quantifiers, and aggregates in QL along the way.

承担侦探的角色在这个虚拟的村庄寻找小偷。你将会学习如何在QL中使用逻辑连接、量词、聚合。

  • 如果我们使用临时变量,在select中不需要用到,可以使用exists:
from Person t
where exists(string c | t.getHairColor() = c)
select t

一些聚合技巧:

Example Result
min(Person p | p.getLocation() = "east" | p order by p.getHeight()) shortest person in the east of the village
count(Person p | p.getLocation() = "south" | p) number of people in the south of the village
avg(Person p | | p.getHeight()) average height of the villagers
sum(Person p | p.getHairColor() = "brown" | p.getAge()) combined age of all the villagers with brown hair
import tutorial

from Person t
where
 /* 1 */ t.getHeight() > 150 and
 /* 2 */ not t.getHairColor() = "blond" and
 /* 3 */ exists (string c | t.getHairColor() = c) and
 /* 4 */ not t.getAge() < 30 and
 /* 5 */ t.getLocation() = "east" and
 /* 6 */ (t.getHairColor() = "black" or t.getHairColor() = "brown") and
 /* 7 */ not (t.getHeight() > 180 and t.getHeight() < 190) and
 /* 8 */ exists(Person p | p.getAge() > t.getAge()) and
 /* 9 */ not t = max(Person p | | p order by p.getHeight()) and
 /* 10 */ t.getHeight() < avg(float i | exists(Person p | p.getHeight() = i) | i) and
 /* 11 */ t = max(Person p | p.getLocation() = "east" | p order by p.getAge())
select "The thief is " + t + " !"

抓住纵火犯

  1. 罪犯是南方人

定义一个新的 谓词

predicate isSouthern(Person p) {
  p.getLocation() =	 "south"
}
...
from Person p
where isSouthern(p)
select p

然后定义子类属性加以约束:

class Southerner extends Person {
  Southerner() { isSouthern(this) }
}
...
from Southerner s
select s

image-20220312093108055

  1. 放火发生在北方,在发生盗窃后10岁以下的儿童不允许旅行。

重载谓词:

class Child extends Person{
    Child(){
        this.getAge()<10
    }
    
    override predicate  isAllowedIn(string region) {
        region=this.getLocation()
    }
}
  1. 纵火犯是两个秃头

最终语句:

import tutorial

predicate isSouthern (Person p) {
    p.getLocation()="south"
}

class Southerner extends Person{
    Southerner(){
        isSouthern(this)
    }
}

class Child extends Person{
    Child(){
        this.getAge()<10
    }
    
    override predicate  isAllowedIn(string region) {
        region=this.getLocation()
    }
}

predicate isBald(Person p) {
    not exists (string c | p.getHairColor() = c)
  }

from Southerner s 
where s.isAllowedIn("north") and isBald(s)
select s

皇位继承人

QL递归用法:

// 返回p的孩子 result是默认的返回值
Person childOf(Person p) { parentOf(result) = p }


//返回p的所有祖先 存在迭代关系
Person ancestorOf(Person p){
    result = parentOf(p) or
    result = parentOf(childOf(p))
}

transitive closure

parentOf()多次应用相同的操作(在这种情况下),在 QL 中非常常见,被称为操作的传递闭包。有两个特殊符号在使用传递闭包+*非常有用:

  • parentOf+(p)parentOf()谓词应用于p一次或多次。这相当于ancestorOf(p).
  • parentOf*(p)parentOf()谓词应用于p零次或多次,因此它返回por的祖先p

relativeOf():

Person relativeOf(Person p) {
  parentOf*(result) = parentOf*(p)
}

最后:

import tutorial

Person relativeOf(Person p) { parentOf*(result) = parentOf*(p) }

predicate hasCriminalRecord(Person p) {
  p = "Hester" or
  p = "Hugh" or
  p = "Charlie"
}

from Person p
where
  not p.isDeceased() and
  p = relativeOf("King Basil") and
  not hasCriminalRecord(p)
select p

过河

一个人正试图把一只山羊、一棵卷心菜和一头狼渡过河。他的船只能载自己,最多只能载一件货物。他的问题是,如果山羊和卷心菜单独在一起,它会吃掉它。如果狼和山羊单独在一起,它会吃掉它。他是怎么把所有东西都过河的?

一道模拟题 ,非常有意思,———> 地址

/**
 * A solution to the river crossing puzzle.
 */

/** A possible cargo item. */
class Cargo extends string {
  Cargo() {
    this = "Nothing" or
    this = "Goat" or
    this = "Cabbage" or
    this = "Wolf"
  }
}

/** One of two shores. */
class Shore extends string {
  Shore() {
    this = "Left" or
    this = "Right"
  }

  /** Returns the other shore. */
  Shore other() {
    this = "Left" and result = "Right"
    or
    this = "Right" and result = "Left"
  }
}

/** Renders the state as a string. */
string renderState(Shore manShore, Shore goatShore, Shore cabbageShore, Shore wolfShore) {
  result = manShore + "," + goatShore + "," + cabbageShore + "," + wolfShore
}

/** A record of where everything is. */
class State extends string {
  Shore manShore;
  Shore goatShore;
  Shore cabbageShore;
  Shore wolfShore;

  State() { this = renderState(manShore, goatShore, cabbageShore, wolfShore) }

  /** Returns the state that is reached after ferrying a particular cargo item. */
  State ferry(Cargo cargo) {
    cargo = "Nothing" and
    result = renderState(manShore.other(), goatShore, cabbageShore, wolfShore)
    or
    cargo = "Goat" and
    result = renderState(manShore.other(), goatShore.other(), cabbageShore, wolfShore)
    or
    cargo = "Cabbage" and
    result = renderState(manShore.other(), goatShore, cabbageShore.other(), wolfShore)
    or
    cargo = "Wolf" and
    result = renderState(manShore.other(), goatShore, cabbageShore, wolfShore.other())
  }

  /**
   * Holds if the state is safe. This occurs when neither the goat nor the cabbage
   * can get eaten.
   */
  predicate isSafe() {
    // The goat can't eat the cabbage.
    (goatShore != cabbageShore or goatShore = manShore) and
    // The wolf can't eat the goat.
    (wolfShore != goatShore or wolfShore = manShore)
  }

  /** Returns the state that is reached after safely ferrying a cargo item. */
  State safeFerry(Cargo cargo) { result = this.ferry(cargo) and result.isSafe() }

  string towards() {
    manShore = "Left" and result = "to the left"
    or
    manShore = "Right" and result = "to the right"
  }

  /**
   * Returns all states that are reachable via safe ferrying.
   * `path` keeps track of how it is achieved.
   * `visitedStates` keeps track of previously visited states and is used to avoid loops.
   */
  State reachesVia(string path, string visitedStates) {
    // Reachable in 1 step by ferrying a specific cargo
    exists(Cargo cargo |
      result = this.safeFerry(cargo) and
      visitedStates = result and
      path = "First " + cargo + " is ferried " + result.towards()
    )
    or
    // Reachable by first following pathSoFar and then ferrying cargo
    exists(string pathSoFar, string visitedStatesSoFar, Cargo cargo |
      result = this.reachesVia(pathSoFar, visitedStatesSoFar).safeFerry(cargo) and
      not exists(int i | i = visitedStatesSoFar.indexOf(result)) and // resulting state is not visited yet
      visitedStates = visitedStatesSoFar + "_" + result and
      path = pathSoFar + ",\nthen " + cargo + " is ferried " + result.towards()
    )
  }
}

/** The initial state, where everything is on the left shore. */
class InitialState extends State {
  InitialState() { this = renderState("Left", "Left", "Left", "Left") }
}

/** The goal state, where everything is on the right shore. */
class GoalState extends State {
  GoalState() { this = renderState("Right", "Right", "Right", "Right") }
}

from string path
where any(InitialState i).reachesVia(path, _) = any(GoalState g)
select path + "."

image-20220312144148068

CodeQL for Java

建立database:

codeql database create /Users/theoyu/workspace/cool/CodeQL/databases/micro-service-seclab-database -l java --command="mvn clean install --file pom.xml"

CodeQL library for Java

Summary of the library classes:

  1. 程序元素:例如class , method
  2. 描述AST节点:例如statement , expression
  3. 描述元数据metadata:例如 注解annotation 注释
  4. 计算指标的类: 例如 cyclomatic complexity and coupling 例如圈复杂度和耦合程度
  5. 用于导航程序的调用图的类 Classes for navigating the program’s call graph

程序元素Program element

为了描述class和method,QL提供了:包(Package)、编译单元(CompilationUnit)、类型(Type)、方法(Method)、构造器(Constructor)和变量(Variable)

Type:类Type有许多子类,用于表示不同种类的类型:

  • PrimitiveType :描述Java语言中的8个基础类型 boolean byte char double float int long short 外带null+void

  • RefType:引用类型(也就是非基础类型),拥有如下子类:

    • Class : 类Class
    • Interface:接口
    • EnumType:枚举
    • Array:数组

    引用类型还可以按照其声明范围进行区分

    • TopLevelType :表示在编译单元顶层声明的类型 (声明类型)
    • NestedType:在一个type内部声明的type
    • TopLevelClass:表示在编译单元顶层声明的类
    • NestedType:在另一个type中声明的类
      • LocalClass:在一个method或者constructor中定义的类
      • AnonymousClass:匿名类

下面进行一些实践:

查找所有int类型的变量:

import java
  from Variable v, PrimitiveType pt
where pt=v.getType() and
	pt.hasName("boolean")
select v

查询所有的声明类型和编译类型不同的类型:

import java
from TopLevelType tl
where tl.getName() != tl.getCompilationUnit().getName()
select tl

变量

  • Filed 代表Java类属性
  • LocalVariableDecl 代表局部变量
  • Parameter 代表方法(method, constructor)参数

AST

AST的节点主要包含语句 Stmt 和表达式 Expr

  • Expr.getAChildExpr返回给定表达式的子表达式。
  • Stmt.getAChild返回直接嵌套在给定语句中的语句或表达式。
  • Expr.getParentStmt.getParent返回 一个AST 节点的父节点。

返回所有 return中的表达式:

import java
from Expr e 
where e.getParent() instanceof ReturnStmt
select e

返回所有if 中的语句

import java
from Stmt s
where s.getParent() instanceof IfStmt
select s

元数据

CodeQL提供Annotatable类,作为所有可以被添加注解的程序元素的父类。例如:package、引用类型、field、method、constructor、local variable declaration。

Annotatable类的谓词getAnAnnotation可以返回程序元素被添加的注解信息。

Analyzing data flow in Java

Local data flow

本地数据流是方法内的数据流,CodeQL为本地数据流分析提供的模块是DataFlow

DataFlow模块定义了Node类,表示数据可以流经的类。Nodes分为表达式节点(ExprNode)和参数节点(ParameterNode)

您可以使用成员谓词asExprasParameter在数据流节点和表达式/参数之间进行映射。

CodeQl In Real Practice