Project 1: Language translation

This project has four stages: Stage 0, Stage 1, Discussion, and Stage 2.  Please read the description of the entire project before starting work on it.

Goals of the project

This project will give you implementation experience in the following areas:
In addition, this project will give you experience using the following concepts:

Description

In this project you will write a compiler that translates an macro-extended shtml into basic shtmlshtml is a pure subset of html and supports only the following commands:

 
For simplicity, assume that no spaces are allowed in any command (i.e., < ul> is not the same as <ul>).  Here is an example of text in shtml(indented for your convenience):

 

<html>
<p>

The main ingredients in <B>Pakistani</B> cooking are:
</p>
<ul>
<li> Spices
<ul>
<li> Cumin seeds </li>
<li> Cayenne pepper </li>
</ul>
</li>
<li> Other
<ul>
<li> Onions </li>
<li> Garlic </li>
<li> Ginger </li>
</ul>
</li>
</ul>
</html>

 
 

Use your favorite web browser to view the output of the above html source.
 
 

Macro-extended shtml (me-shtml) extends shtml with the following two commands:

Stage 0: preparing for the project (20 points)

  1. Is the following grammar parsable using LL(1)?  Why or why not.  Transform this grammar into one that is parsable using LL(1).

  2. A -> Ac | Acd | bd |  epsilon
  3. Write a context-free grammar for me-shtml (including the grammar needed for the shtml subset of me-shtml). The grammar should be parsable using a top-down (recursive descent) parser using one token of lookahead (i.e., LL(1)).
  4. Familiarize yourself with the Java environment and learn Java.  Also familiarize yourself with the I/O facilities in Java. In particular, play with the PushbackReader which allows you to "unread" characters-this is useful for examining lookahead characters in lexical analysis..
You only need to submit your solutions to (1) and (2).  Bring your submission in hardcopy form to class on the due date.

Stage 1 (35 points)

  1. Write a lexical analysis that breaks down a me-shtml file into tokens.  You can assume that in correct me-shtml files characters "<" and ">" only appear as part of the commands of me-shtml.  Your code should signal an error if it identifies any incorrect uses of "<" and ">".  Use objects to represent tokens and exceptions to handle errors.  In my solution, I found it convenient to have to have the following token classes: BegHtml, EndHtml, BegPara, EndPara, BegList, EndList, BegListItem, EndListItem, BegBold, EndBold, PlainText, BegMacroDef, EndMacroDef, and MacroUse.
  2. Write a recursive-descent parser that builds an abstract syntax tree.  In my solution, I found it convenient to have the following kinds of parse tree nodes: MacroDefNode, MacroUseNode, BoldNode, PlainTextNode, ParaNode, ListItemNode, ListNode, and HtmlNode.
  3. Write code to perform the macro expansion.  In my solution, this primarily consisted of adding a print method to all the nodes and also lookup methods that searched the lexically enclosing symbol tables.
When I constructed a solution for a slightly more complicated version of this project, I took 4 hours to do Part (1) (about 350 lines when counted using wc) and 1.5 hours for parts 2 and 3 (about 200 lines when counted using wc).

Watch this space for instructions on how to submit your solution for Stage 1.

Discussion (10)

In this  stage you will participate in a discussion of how to further extend your translator.  We will consider two extensions: See Stage 2 for a more detailed description of the above two alternatives.
At the end of the discussion period, we will pick one extension based on the ratings assigned by the class.  All students will implement the same extension in Stage 2.

To participate in the discussion, go to deming.colorado.edu and log in as you normally do for asking questions on the day's readings.  However, this time, instead of clicking on "Programming languages daily question" click on "3155 discussions".  In the next screen click on "Discussion for project 1".

Under the heading "Criteria importance rating" you will enter your opinion on the importance of criteria "simplicity", "expressiveness" and "ease of implementation"  for language features.  For each of these criteria, enter a number between 0 and 100 indicating how importance you consider it to be (note that there is no right answer and even respected language designers differ on their relative importance).  For example, if you consider Simplicity to be most important for a language feature then you would give it a very high rating (say 100).  Or, if you consider Ease of implementation to be not very important, you would give it a low rating (say 10).

Under the heading of "Criteria Relevance Rating" you will enter how each criteria matches up with each of the two options ("Dynamic scoping of macros" or "parameters for macros").  For example, if you think that "parameters for macros" are very simple but "dynamic scoping" is not so simple, you may enter "90" and "20" respectively for the two options under "Simplicity".

Once you have entered the above two pieces of information, click on the update button. (note that this page has several other buttons, but we recommend you keep away from them--they will be used in the later projects).   Your score in the discussion phase will depend on how your personal choices matched up with the rest of the class.

Stage 2 (35 points)

In Stage 2 we will implement the extension we decide on in our online discussion.  We will announce the extension to implement in the class mailing list and also in recitations. You can also see the extension to implement by logging in to the PEP system.  Here are some more details on the two alternatives:

Parameter passing of macros

For parameter passing, we will extend the syntax of the macro definition to be:
<MD!macroname!param1!...!paramn> macro body </MD>

param1,...,paramn are the names of parameters to the macro.  These names can be used in the macro body as if they were macros.  For example

<html>
<MD!fullname!fname!lname><MU!fname>   <MU!lname></MD>
<MD!first> Amer </MD>
<MD!last> Diwan </MD>

<MU!fullname!first!last>
</html>

Translates into this:
<html>
Amer Diwan
</html>

This example also demonstrates how to use a macro with a parameter: we have an extended version of the <MU...> command that can take arguments.  Each argument to the MU must be a macro name or an argument to an enclosing macro definition (recall than an argument also acts as if it is a macro name).  In other words, instead of passing first and last to MU above, if we passed "Amer" and "Diwan", that would be illegal since "Amer" and "Diwan" are not macros, they are simply text.  Also, the semantics of macro parameters allows a macro to use itself which would lead to infinite looping.  For example, consider:

<html>
<MD!M!n><MU!n!n></MD>
<MU!M!M>
</html>

You can assume in your implementation that such cycles do not happen.

To implement parameter passing of macros, you need to do the following:

The above description of how to implement parameter passing is customized to my solution; some of the details may differ if your solution approach differs substantially from mine.
 

Dynamic scoping of macros

Dynamic scoping does not require any addition or changes to the syntax.  However, it does, profoundly affect how you do your macro lookups.  Here is a variation on the above example which achieves the same effect as parameters but using dynamic scoping.

<html>
<MD!fullname> <MU!first> <MU!last></MD>
<p>
<MD!first> Amer  </MD>
<MD!last> Diwan </MD>
<MU!fullname>
</p>
<p>
<MD!first> Richard  </MD>
<MD!last> Sharp </MD>
<MU!fullname>
</p>
</html>

The above example once again declares a macro called fullname, but this time instead of taking parameters, it simply uses two other macros, called first and last respectively.   This document contains two paragraphs, each with one use of "fullname".  The first paragraph defines "first" and "last" to be "Amer" and "Diwan" respectively and the second paragraph defines "first" and "last" to be "Richard" and "Sharp" respectively.  When the first paragraph uses "fullname", it expands into "Amer Diwan" and when the second paragraph uses "fullname" it expands into "Richard Sharp.

Here is an outline of an approach to implementing dynamic scoping:

Resources

You must use the Java programming language for this project.  See http://java.sun.com/docs/books/jls/html/index.html  for the official Java language definition.  In addition, feel free to use any book on Java and to consult with the TAs if you run into difficulties in understanding or using Java.