CS131 Summary

宋富老师的Compiler课,东西很多,忘得快。记点框架,免得全忘了。

教材:Compilers: Principles, Techniques, & Tools 2nd

Lexical Analysis

正则表达式

Atomic reg: for every a in $\Sigma$ , Epsilon: $\epsilon$

Compound reg: $r+s,rs,r^*$

Transition diagrams (a graph representation of reg)

自动机

Deterministic, DFA( widely used in lexical analyzers ); Non-Deterministic NFA

Two algorithm:

  • Reg=>NFA=>DFA
  • Reg=>DFA (more complicated)

Reg => NFA

Allows $\epsilon$ transitions, move to another state without consuming any symbol.

Crafting every symbols one by one with $L(\epsilon),L(a),L(s)\cup L(t)$

NFA => DFA

Calculate $\epsilon$ passing and construct all possible subsets and corresponding input symbol.

Rename each subset and reassemble with input symble.

Reg => DFA

Evaluate firstpos, latpos, nullable. Draw trees and construct DFA.

Abstractive and a little complicated, needs reviewing according to tutorial in youtube.

Minimize DFA

Forgot.

Error Handling

  • Panic Mode
  • Phrase Level

Context free language and parsing

Eliminate ambiguity

  • Grammar rewrite
  • Enforce precedence and associativity

Top-down parsing

Recursive-descent paring

Backtracking, general, not efficient

Eliminate left recursion:

  • Immediate recursions
  • All left recursions

Predictive parsing

No backtracking, efficient, LL(k)

Left-factoring

Construct LL(1) parsing table

For each rule $A\rightarrow \alpha$ :

  • terminal a in FIRST($\alpha$)? Add $A\rightarrow \alpha$ to M[A,a]
  • $\epsilon$ in $FIRST(\alpha)$? Add $A\rightarrow \alpha$ to M[A,b] for all terminals b in $FOLLOW(A)$
  • $\epsilon$ in $FIRST(\alpha)$? and \$ in $FOLLOW(A)$? Add $A\rightarrow \alpha$ to M[A,\$]
  • All undefined entries are errors

Not a LL(1) grammar and error recovery.

Bottom-up parsing

Operator precedence parsing

Not much impression, maybe not important.

Construct LR(0) parsing table

Long journey, need tutorial on youtube

SLR(1) parsing table

建表时,reduce操作只向$Follow(A)$中加.

Cautions to ambiguous grammar (left/right associative)

LR(1)

对一个state,包含所有可能的下一个input.

LALR(1) parsing table

引入reduce-redure conflict.

Intermediate represnetation

  • Abstract syntax trees

保留代码和流程,不能遍历、修改。

  • Directed acyclic graphs

AST合并同值节点

  • Control flow graphs

冗余、保守的源信息

  • Single static assignment

每个变量只定义、赋值一次,有利于优化

  • Stack machine code

容易生成,兼容性好;难优化,难重用

  • 3-address code

适合多个level的IR;占空间,语义消失

Compilation strategies

  • Hight-level
  • Mid-level
  • Low-level

以上是期中前的内容,下面是期中后的内容