Project 1: Language translation
This project has four stages: Stage 0, Stage 1,
Discussion, and Stage 2. Please read the description of the entire
project before starting work on it.
Goals of the project
This project will give you implementation experience
in the following areas:
-
Writing grammar rules
-
Lexical analysis
-
Parsing
-
Static-scope analysis
-
Simple calls
In addition, this project will give you experience
using the following concepts:
-
Objects and method invocations
-
Exception handling (should work this into the error recovery mechanism)
-
Recursion
Description
In this project you will write a compiler that translates
an
macro-extended shtml into basic shtml. shtml
is a pure subset of html and supports only the following commands:
-
<html>...</html>: Surrounds the entire
document "..."
-
<p>...</p>: Text "..." is a paragraph
-
<B>...</B>: Put text "..." in bold
-
<ul> <li>...</li> <li>...</li></ul>:Identifies
a bulletted list. A list may have any number of items (i.e., <li>...</li>)
and lists may be nested.
For simplicity, assume that no spaces are allowed
in any command (i.e., < ul> is not the same as <ul>). Here
is an example of text in shtml(indented for your convenience):
<html>
<p>
The main
ingredients in <B>Pakistani</B> cooking are:
</p>
<ul>
<li>
Spices
<ul>
<li>
Cumin seeds </li>
<li>
Cayenne pepper </li>
</ul>
</li>
<li>
Other
<ul>
<li>
Onions </li>
<li>
Garlic </li>
<li>
Ginger </li>
</ul>
</li>
</ul>
</html>
Use your
favorite web browser to view the output of the above html source.
Macro-extended
shtml (me-shtml) extends shtml with the following two commands:
-
<MU!name>: Represents the expansion of the macro with
name "name". Macros follow static scoping.
-
<MD!name>expand</MD>: Declares
a macro called "name" which when used in the document (see next item) expands
into "expand". Expand may contain only syntactically
correct commands. In particular, for each <p>, <B>, <ul>,
and <li>, there must be a corresponding closing command </p>, </B>,
</ul>, and </li>. A macro may be defined only at the beginning
of a block (i.e., immediately after a <html>, <p>, <ul>, or <li>).
The scope of the macro definition starts after the last </MD>
in the block is encountered and continues to the end of its immediately
enclosing "block".
Stage 0: preparing for the project (20 points)
-
Is the following grammar parsable using LL(1)? Why or why not.
Transform this grammar into one that is parsable using LL(1).
A -> Ac | Acd | bd | epsilon
-
Write a context-free grammar for me-shtml (including the grammar needed
for the shtml subset of me-shtml). The grammar should be parsable using
a top-down (recursive descent) parser using one token of lookahead (i.e.,
LL(1)).
-
Familiarize yourself with the Java environment and learn Java.
Also familiarize yourself with the I/O facilities in Java. In particular,
play with the PushbackReader which allows you to "unread" characters-this
is useful for examining lookahead characters in lexical analysis..
You only need to submit your solutions to (1) and
(2). Bring your submission in hardcopy form to class on the due date.
Stage 1 (35 points)
-
Write a lexical analysis that breaks down a me-shtml file into tokens.
You can assume that in correct me-shtml files characters "<" and ">"
only appear as part of the commands of me-shtml. Your code should
signal an error if it identifies any incorrect uses of "<" and ">".
Use objects to represent tokens and exceptions to handle errors.
In my solution, I found it convenient to have to have the following token
classes: BegHtml, EndHtml, BegPara, EndPara,
BegList,
EndList,
BegListItem,
EndListItem,
BegBold,
EndBold,
PlainText,
BegMacroDef,
EndMacroDef,
and MacroUse.
-
Write a recursive-descent parser that builds an abstract syntax tree.
In my solution, I found it convenient to have the following kinds of parse
tree nodes: MacroDefNode, MacroUseNode, BoldNode,
PlainTextNode,
ParaNode,
ListItemNode,
ListNode,
and HtmlNode.
-
Write code to perform the macro expansion. In my solution, this primarily
consisted of adding a print method to all the nodes and also lookup methods
that searched the lexically enclosing symbol tables.
When I constructed a solution for a slightly more
complicated version of this project, I took 4 hours to do Part (1) (about
350 lines when counted using
wc) and 1.5 hours for parts 2 and 3
(about 200 lines when counted using
wc).
Watch this space for instructions on how to submit your solution for
Stage 1.
Discussion (10)
In this stage you will participate in a discussion
of how to further extend your translator. We will consider two extensions:
-
Allow macros to take parameters: add a simple parameter passing mechanism
to macros, or
-
Use dynamic scoping of macros: instead of static scoping, use dynamic scoping
of macros. Recall from lecture that you can use dynamic scoping to
get some of the functionality of parameters.
See
Stage 2 for a more detailed description of the
above two alternatives.
At the end of the discussion period, we will pick
one extension based on the ratings assigned by the class. All students
will implement the same extension in Stage 2.
To participate in the discussion, go to deming.colorado.edu and
log in as you normally do for asking questions on the day's readings.
However, this time, instead of clicking on "Programming languages daily
question" click on "3155 discussions". In the next screen
click on "Discussion for project 1".
Under the heading "Criteria importance rating" you will enter
your opinion on the importance of criteria "simplicity", "expressiveness"
and "ease of implementation" for language features.
For each of these criteria, enter a number between 0 and 100 indicating
how importance you consider it to be (note that there is no right answer
and even respected language designers differ on their relative importance).
For example, if you consider Simplicity to be most important for
a language feature then you would give it a very high rating (say 100).
Or, if you consider Ease of implementation to be not very important,
you would give it a low rating (say 10).
Under the heading of "Criteria Relevance Rating" you will enter
how each criteria matches up with each of the two options ("Dynamic
scoping of macros" or "parameters for macros"). For example,
if you think that "parameters for macros" are very simple but "dynamic
scoping" is not so simple, you may enter "90" and "20" respectively for
the two options under "Simplicity".
Once you have entered the above two pieces of information, click on
the update button. (note that this page has several other buttons, but
we recommend you keep away from them--they will be used in the later projects).
Your score in the discussion phase will depend on how your personal choices
matched up with the rest of the class.
Stage 2 (35 points)
In Stage 2 we will implement the extension we decide
on in our online discussion. We will announce the extension to implement
in the class mailing list and also in recitations. You can also see the
extension to implement by logging in to the PEP system. Here are
some more details on the two alternatives:
Parameter passing of macros
For parameter passing, we will extend the syntax of the macro definition
to be:
<MD!macroname!param1!...!paramn> macro body </MD>
param1,...,paramn are the names of parameters to the macro. These
names can be used in the macro body as if they were macros. For example
<html>
<MD!fullname!fname!lname><MU!fname> <MU!lname></MD>
<MD!first> Amer </MD>
<MD!last> Diwan </MD>
<MU!fullname!first!last>
</html>
Translates into this:
<html>
Amer Diwan
</html>
This example also demonstrates how to use a macro with a parameter:
we have an extended version of the <MU...> command that can take arguments.
Each argument to the MU must be a macro name or an argument to an enclosing
macro definition (recall than an argument also acts as if it is a macro
name). In other words, instead of passing first and last
to MU above, if we passed "Amer" and "Diwan", that would be illegal since
"Amer" and "Diwan" are not macros, they are simply text. Also, the
semantics of macro parameters allows a macro to use itself which would
lead to infinite looping. For example, consider:
<html>
<MD!M!n><MU!n!n></MD>
<MU!M!M>
</html>
You can assume in your implementation that such cycles do not happen.
To implement parameter passing of macros, you need to do the following:
-
Modify your scanner so that its BeginMacroDef token contains the names
of the arguments of the macro, and the MacroUse token contains the arguments
passed to the macro.
-
Add the formal parameters of each macro to its symbol table. (Recall
that in Stage 1, a Macro itself didn't need a symbol table, but now it
does). Now, a symbol table in my implementation is a list of pairs,
where each pair contains the name of the macro and the body of the macro.
So, in the case of macro fullname above, the symbol table for fullname
contains two entries entry (fname, nil) and (lname, nil) which indicates
that there are macros called "fname" and "lname" in this scope, but their
bodies are nil (the body will be filled in when the macro is used).
-
When you want to expand the use of a macro into its body, create a copy
of the macro AST node. In the above example, we will make a copy
of "fullname". Recall that the macro node has a symbol table associated
with it, with two entries (fname, nil) and (lname, nil). After making
the copy we replace the "nil" in the symbol table with a use of the parameter
passed in. Since we pass "first" and "last" we replace the nils in
the symbol table with uses of "first" and "last" so that the symbol table
now has two entries: ("fname", <MU!first>) and ("lname", <MU!last>).
Now we can process the body of "fullname" normally: any uses of fname and
lname are expaned out correctly.
The above description of how to implement parameter passing is customized
to my solution; some of the details may differ if your solution approach
differs substantially from mine.
Dynamic scoping of macros
Dynamic scoping does not require any addition or changes to the syntax.
However, it does, profoundly affect how you do your macro lookups.
Here is a variation on the above example which achieves the same effect
as parameters but using dynamic scoping.
<html>
<MD!fullname> <MU!first> <MU!last></MD>
<p>
<MD!first> Amer </MD>
<MD!last> Diwan </MD>
<MU!fullname>
</p>
<p>
<MD!first> Richard </MD>
<MD!last> Sharp </MD>
<MU!fullname>
</p>
</html>
The above example once again declares a macro called fullname, but this
time instead of taking parameters, it simply uses two other macros, called
first and last respectively. This document contains two paragraphs,
each with one use of "fullname". The first paragraph defines "first"
and "last" to be "Amer" and "Diwan" respectively and the second paragraph
defines "first" and "last" to be "Richard" and "Sharp" respectively.
When the first paragraph uses "fullname", it expands into "Amer Diwan"
and when the second paragraph uses "fullname" it expands into "Richard
Sharp.
Here is an outline of an approach to implementing dynamic scoping:
-
Build the abstract syntax tree (as you already do for stage 1)
-
Traverse the tree while while keeping track of currently active macros
on a stack. For example, when we enter the first paragraph above,
the current macros on the stack are "first" and "last" with definitions
"Amer" and "Diwan" respectively. When a macro use is encountered,
use the stack of active macros to find the macro definition for the macro
being used. When we exit a paragraph, we need to remove all the macros
from the stack that were pushed in when we entered the paragraph.
Thus, when we exit the first paragraph, we will remove the two macros from
the stack. The stack effectively acts as a symbol table that is updated
when we enter and exit scopes.
Resources
You must use the Java programming language
for this project. See http://java.sun.com/docs/books/jls/html/index.html
for the official Java language definition. In addition, feel free
to use any book on Java and to consult with the TAs if you run into difficulties
in understanding or using Java.