Project 3: higher order functions and functional languages

Goals of the project

The goals of this project are to give you experience programming in a functional language (SML).  More specifically, you will get experience with the following language features:

Introduction

The goal of compiler optimizations is to improve the performance of the code without changing the meaning of the code.  In this project we we consider a simple optimization, arithmetic simplification of expressions made up of constants, variables, addition, subtraction, and negation.  Here are some examples of expressions:

x + 7 + y
a + -7 - 3
a + b - -c

The expressions are represented as a tree, each internal node of which is a operator (addition, subtraction, or negation) and each leaf is either a constant or a string (which represents a variable with that name).   Here is the representation that I provide:

datatype node = Constant of int
                | Symbol of string
                | Negation of node
                | Add of node*node
                | Subtract of node*node;

Your goal is to transform the tree so that it contains fewer calculations.  In the next four sections I'll give you some examples and describe the three components of the project: transformations, simplifications, and heuristics.

Examples

  1. Consider the tree for "a  + (5 + 3)":

  2. Add(Symbol(a), Add(Constant(5), Constant(3)))
    A simple analysis can find that the inner add operates on two constants and the inner tree can be simply replaced by Constant(8) to yield:
    Add(Symbol(a), Constant(8))
  3. Consider  the tree for "(a + -5) + 3":

  4. Add(Add(Symbol(a), Negate(Constant(5))), Constant(3))
    When we look at this expression and tree it is obvious that we can replace it by a simpler tree: a - 2.  However, the way the tree is organized, there is no Add or Subtract node with only constant operands.  Thus, we can transform the tree in the following steps to end up with something that is easy to simplify:
    Add(Add(Symbol(a), Constant(-5)), Constant(3)) (* Simplify Negate which has a constant operand *)
    Add(Symbol(a), Add(Constant(-5), Constant(3))) (* Using (a+b)+c == a + (b + c) *)
    Now we have an Add which has two constant operands.  thus we can replace that with Constant(-2) to get:
    Add(Symbol(a), Constant(-2)
    The question to consider here is: how and why did we decide to use the associative property?  We could have transformed the tree in many other ways (e.g., by commuting it to get Add(Constant(3), Add(Symbol(a), Constant(-5)))
  5. Consider the tree for "(x + 5) + (z + 7)":

  6. Add(Add(x, 5), Add(z, 7))
    (for simplcity, in the remainder of the examples, when I write "x" I mean Symbol(x) and when I write a constant, say 5, I mean Constant(5))
    Like the previous example, none of the Add or Subtract can be immediately simplified since none of them have only constants as arguments.   Here is one sequence of transformations that allows us to simplify the tree:
    Add(x, Add(5, Add(z, 7))) (* associate: (a + b) + c == a + (b + c) *)
    Add(x, Add(5, Add(7, z))) (* commute: a + b == b + a *)
    Add(x, Add(Add(5, 7), z)) (* associate: a + (b + c) == (a + b) + c *)
    Now we have an Add with two constant operands, thus we get:
    Add(x, Add(12, z))
  7. Consider the tree for "4 - (x - 5)":

  8. Subtract(4, Subtract(x, 5))
    Let's transform this:
    Subtract(4, Add(x, Negate(5))) (* x - y = Negate(y) + x *)
    Subtract(4, Add(Negate(5), x)) (* commute: x + y = y + x *)
    Subtract(Subtract(4, Negate(5)), x) (* associate: x - (y + z) == x - y - z *)
    Subtract(Subtract(4, -5), x) (* Negate(constant) == -constant *)
    Now we can do the subtraction on constants.

Transformations

  • As we saw in the Examples section, there are a small number of transformations that we routinely use in order to get the expression tree in a form that we can simplify it using the simplifications.  Here they are:
  • Simplifications

    If we consider the example, there are a very small number of simplifications we use repeatedly: All of these simplifications operate on constants.  Here are two that are more general:
  • There are many other simplifications that we can imagine.  For at least Stage 0 of project we will stick with the ones above.
  • Heuristics

    The goal of the heuristics is to decide when to apply a particular transformation.  Whenever a simplification is applicable, it makes sense to apply it;  however, it doesn't make sense to apply tranformations always.  For example, one could go into an infinite loop applying Associative and AntiAssociative to x + (y + z).  Clearly we need some strategies for applying the heuristics.  In this project, you will need to come up with some heuristics.

    Stage 0

    Copy the files in ~csci3155/proj3handout into your directory.  These files are mostly empty: you need to put your code in these files.  One of these files, setup.sml (described below) contains the datatype declaration which I've already given above and a tostring function that takes a tree and returns a string that contains a flattened infix representation of the tree.  The file test.sml gives a small tree and invokes tostring on it.  This code in test.sml is just for illustration purposes.

    For Stage 0 you need to write and submit one routine for each of the above mentioned transformations and simplifications.  Each of these routines takes a node and return a (possibly the same) node.  In addition, you need to write a driver routine, MapTree, that you will use to invoke the transformations and simplfications.  The rest of this section describes the functionality and interface of the MapTree routine.

    MapTree takes two arguments, a node, n, and a function, f, of type node->node.  It applies the function f to each node in the tree rooted at n in a bottom-up fashion and builds a new tree in the process using the value returned by f.  For example consider the following tree:
    Add(Constant(10), Symbol(x))
    MapTree will perform the following steps:
    val t = f(Constant(10));
    val u = f(Symbol(x));
    val k = f(Add(t, u))
     

    You should put all your routines in program.sml and submit it electronically using our usual submission mechanism.  My solution for this adds up to about 40 lines of code.  However, the hard part is writing your first routine: once you get a handle on it, the rest are easy.

    Discussion

    In the previous stage we build some mechanisms: one mechanism for traversing a tree and building a new tree from it (MapTree) and a number of mechanisms for transforming  the tree (or rather building a new transformed version of the tree).  However, we didn't implement any policy.  In particular, when should the heuristics be?  This stage proceeds similarly to the discussion stage for Project 2.

    The discussion is in two part.  The first part ends Monday April 23rd at 5:00 p.m. and the second part ends on Tuesday April 24th at 5:00 p.m.  Here are the description of what you need to do for each part:

    1. For the first part you need to log into PEP (as in discussion for Project 2), click on Project 3, and enter one heuristic (in the Option 1 box) and one criteria for evaluating the heuristics (in the Criterion 1 box).  Note that in the earlier projects I gave you the criteria.  In this project since we are not evaluating programming language features, the criteria may or may not be the same as ones we have been using.  As with Project 2, you can see the criteria and options that your fellow students have picked at the bottom of the screen.  In addition, you should  enter justification for why you think your option/criteria or someone elses' option/criteria makes sense or does not make sense.  To do this, click on the "all of your submissions" link and put your comments in the form there.  Note that you can log into the system multiple times and put in additional comments--view these comments as an online dialog between yourself and your fellow students.
    2. For the second part, you need to rate the options and the criteria that I select from the options and criteria submitted in part 1.  This is just the same as in the earlier projects except that both the options and criteria are provided by the students.
    The discussion proceeds just as the discussion for Project 2.  In this discussion you will suggest different heuristics for how to apply the transformations.  This project is different from the other two projects because you are not writing a language processor.  Thus, some of the criteria for evaluating the heuristics will be different than the ones we have been using to evaluate language extensions.  Thus, during this discussion you should submit not only an option (heuristics) but also a possible criteria for evaluating the heuristics.  In addition to submitting an
    After you have submitted the options and the criteria, I will pick some options and criteria and we will all rate the options and criteria much like the discussions in the previous assignments.

    Stage 2

    In this stage we will implement the option that we picked in the discussion stage and call MapTree with the heuristic.  Here are the steps you need to follow:
    1. Write a function, say myHeuristic (type: node->node) that applies the heuristic to a node and depending on what the heuristic says, it may apply a transformation to the node.  For example, let's suppose our heuristic is: if there is a constant as a left operand of an Add, then commute.  Then, myHeuristic will be:
        fun myHeuristic (n as Add(Constant(x), y)) = commute(n)
            | myHeuristic x = x;
    2. Create a function that is the composition of all your simplifications.  For example,
        val allSimplifications = SubConstants o NegateConstants o ...
      This function will simply try to apply each simplification once.
    3. Create a composition of your heuristic and the simplifications:
        val myHeuristicAndSimplifications = allSimplifications o myHeuristic;
      This function when called on a node will first transform the node, based on the heuristic and then try to simplifiy the node;
    4. Test your program by calling MapTree:
        MapTree (myHeuristicAndSimplifications, atree)
      "atree" is a tree that you want to test your analysis on.
    Note that I've already given you the code for steps 2 and 3 (for step 2, you just need to fill in the names of all your simplifications).  In addition to the code, you need to submit an answer to the following question: How would the coding experience have been different if you had written your code in C++ or Java?

    What to submit: Submit all the three sml files.  The test.sml files should have the code that you use to test your code.

    Using the system

    To use the system, add the following directory to your path:
    /tools/cs/smlnj/bin
    The above path works at least on the solaris and the linux machines in the ugrad lab.

    To invoke the compiler type "sml".  This will give you an interpreter prompt in which you can type in SML code.  However, I recommend you put your code in one or more files and issue the following command when you invoke sml:

    use "myfile";

    This will open your file and execute all the code in it as if you had typed it in at the interpreter prompt.  My code is spread out over three files: setup.sml, program.sml, and test.sml.  I will provide setup.sml to you;  it contains declarations for the tree.  program.sml contains my solution to this project.  test.sml contains the tests for my project.  Since tests.sml depends on setup.sml and program.sml and program.sml depends on setup.sml, I've added:
    use "setup.sml";
    use "program.sml";
    at the top of my test.sml.  Thus, to run and test my program, I simply need to type in "sml" at the shell prompt and
    use "setup.sml";
    at the sml interpreter prompt.  Note that I'm providing you with the full contents of setup.sml and stubs for program.sml and test.sml.

    Resources

    There are several good tutorials for SML.  I recommend you start with this one since it is written very nicely and walks you through all the main features of SML.  Ignore the "Module system" chapter of the tutorial: we won't be needing it for this project.  Bob Harper, the author of the first tutorial, has revised his tutorial and that is available here.  The new tutorial is even better but it is also a bit more advanced.  I recommend you use the first tutorial and refer to the second one only for looking up things when you are stuck.

    This web site has a number of other tutorials on SML and SML/NJ (the compiler that we are using).