Basically, a compiler is a program that reads a program written in a source language and translates it into an equivalent program in a target language. The compiler is a large and complex program so for simplicity purposes, it is being divided into two major parts and which are then further broken down into different phases on the base of their complexity. A typical decomposition of a compiler is shown in the figure below.
In the first phase of a compiler, we define lexical rules by regular expression. Lexical Analysis is also called Scanning. The Lexical Analyzer reads the source program character by character and returns the tokens of the source program, which are pattern of characters having same meaning in the source program. In this phase, the source program is checked to have valid characters and words. After detailed analysis, input stream of tokens are stored in a buffer.
The second phase of a compiler is the Syntax Analysis. Generally, a Syntax Analyzer creates the syntactic structure (generally a parse tree) of the given program. In this phase, we define syntax rules by Context Free Grammar (CFG). Syntax Analysis is also called parser. During parsing, syntax rules are checked. Inputs of this phase are tokens and output is a parse tree or a syntax tree.
The third phase is called Semantic Analysis. Basically the word semantic means the “meaning”. A semantic analyzer is the one which checks the source program for semantic errors and collects the type information for the code generation. In this phase, we define semantic rules by attribute grammars. Semantic Analysis has two types: Declaration Checking and Type Checking. The input of this phase is a syntax tree and the output is an attribute grammar.
The last and final phase is the Target Code Generation. After analysis of source code is completed, then last step is converting it into a target language. However, there are still other optional phases; namely the Source Code Optimizer, Target Code Optimizer and Intermediate Code Generator. After the syntax and semantic analysis, some compilers may generate explicit intermediate codes representing the source program. So this is where these remaining phases take place. The code optimizer is the one responsible for optimizing the code produced by the intermediate code generator in the terms of time and space so as to produce a faster-running machine code. And lastly, the Code Generator produces the target language in a specific architecture in which the target program is normally a relocatable object file containing the machine codes.
References:
• http://www.cs.bilkent.edu.tr/~ilyas/Courses/CS416/lec00-outline.PPT
• http://www.personal.kent.edu/~rmuhamma/Compilers/compiler.html
 
