https://zhuanlan.zhihu.com/p/146233736The Dragon Book. A rather advanced and mathematical book.
编译要经过词法分析,语法分析,代码生成和优化几个阶段。词法分析输出token流,例如123是数,"value"是字符串。语法分析输入token流,它可以只分析(语法检查),分析+执行(Interpreter),生成语法树、中间代码、bytecode等。
可以自己写词法和语法分析器,也可以用lex和yacc(yet another compiler-compiler).它们根据你写的词法和语法规则(.l和.y文件)生成C代码,你调用yyparse()进行语法分析,它调用yylex()获取token,你可以使它从文件或字符串读。Lex and Yacc were the first popular and efficient lexers and parsers generators, flex and Bison were the first widespread open-source versions compatible with the original software.
读懂flex和Bison的源码不现实。Bing(BNF and EBNF: What are they and how do they work?). The easiest way of parsing something according to a grammar in use today is called LL parsing (or top-down parsing). Recursive descent is a top-down parsing technique that constructs the parse tree from the top and the input is read from left to right. It uses procedures for every terminal(终结符) and non-terminal entity.
欲分析语言必先定义它。Bing(BNF rules of LISP). 太罗嗦:-) since lists are used so often in LISP, a better notation is to drop the dot. A s-expression, also known as a sexpr or sexp, is a way to represent a nested list of data. It stands for "symbolic expression". So:
%token ATOM /* ATOM是终结符 */
%start sexp /* sexp是最顶级的符号 */
%%
sexp : ATOM | list ; /* sexp是ATOM或list */
list : '(' seq ')' ; /* list是( 和 sequence 和 ) */
seq : sexp | sexp seq;
A production (rule) [产生式规则] is a rewrite rule specifying a symbol substitution that can be recursively performed to generate new symbol sequences. sexp->list->(seq)->(sexp seq)->(sexp sexp)-...>(ATOM (ATOM ATOM)). 我们这个语言里(1 (a bc))这样的才算人话。
The easiest way of parsing something according to a grammar is called LL parsing (or top-down parsing). It works like this: for each production find out which terminals the production can start with. (This is called the start set.) Then, when parsing, you just start with the start symbol and compare the start sets of the different productions against the first piece of input to see which of the productions have been used.
import sys
def parse_seq():
while parse_sexp():
pass
def parse_sexp():
t = get_token()
if t == '\0' or t == ')':
return 0
elif t == '(':
parse_seq()
else:
print('ATOM:' + t)
return 1
def get_token():
return '\0' if len(tokens) == 0 else tokens.pop(0)
# On Windows, (1 (2 3)) Enter Ctrl-Z Enter
tokens = sys.stdin.read().replace('(', '( ').replace(')', ' )').split()
parse_sexp()
发布于刚刚
--
FROM 106.121.74.*