Tuesday, June 24, 2008

The Phases of a Compiler

Basically, a compiler is a program that reads a program written in a source language and translates it into an equivalent program in a target language. The compiler is a large and complex program so for simplicity purposes, it is being divided into two major parts and which are then further broken down into different phases on the base of their complexity. A typical decomposition of a compiler is shown in the figure below.

The two major parts of a compiler are the Analysis phase and the Synthesis phase. The analysis part breaks up the source program into constant piece and creates an intermediate representation of the source program. The Lexical Analyzer, Syntax Analyzer and Semantic Analyzer form the bulk of this part. On the other hand, the synthesis part constructs the desired target program from the intermediate representation. The Intermediate Code Generator, Code Generator, and Code Optimizer are the parts of this synthesis portion. Each phase transforms the source program from one representation into another representation. They all communicate with error handlers and so with the symbol table. Informally, we shall also consider the symbol table manager and the error handler “phases”.

In the first phase of a compiler, we define lexical rules by regular expression. Lexical Analysis is also called Scanning. The Lexical Analyzer reads the source program character by character and returns the tokens of the source program, which are pattern of characters having same meaning in the source program. In this phase, the source program is checked to have valid characters and words. After detailed analysis, input stream of tokens are stored in a buffer.

The second phase of a compiler is the Syntax Analysis. Generally, a Syntax Analyzer creates the syntactic structure (generally a parse tree) of the given program. In this phase, we define syntax rules by Context Free Grammar (CFG). Syntax Analysis is also called parser. During parsing, syntax rules are checked. Inputs of this phase are tokens and output is a parse tree or a syntax tree.

The third phase is called Semantic Analysis. Basically the word semantic means the “meaning”. A semantic analyzer is the one which checks the source program for semantic errors and collects the type information for the code generation. In this phase, we define semantic rules by attribute grammars. Semantic Analysis has two types: Declaration Checking and Type Checking. The input of this phase is a syntax tree and the output is an attribute grammar.

The last and final phase is the Target Code Generation. After analysis of source code is completed, then last step is converting it into a target language. However, there are still other optional phases; namely the Source Code Optimizer, Target Code Optimizer and Intermediate Code Generator. After the syntax and semantic analysis, some compilers may generate explicit intermediate codes representing the source program. So this is where these remaining phases take place. The code optimizer is the one responsible for optimizing the code produced by the intermediate code generator in the terms of time and space so as to produce a faster-running machine code. And lastly, the Code Generator produces the target language in a specific architecture in which the target program is normally a relocatable object file containing the machine codes.

References:

http://www.cs.bilkent.edu.tr/~ilyas/Courses/CS416/lec00-outline.PPT
http://www.personal.kent.edu/~rmuhamma/Compilers/compiler.html

No comments: