Final Project: Huffman Encoding
Assigned: April 27, 2018
Due: May 15, 2018 at 5pm
Summary: In this assignment, you will apply several of the data structures we have used
in class to create a file compression/decompression system using the Huffman
Encoding algorithm. You will have some time in class to work on the
assignment with your partner, but plan to spend time outside of class on the
assignment as well.
Learning To experience full-system object oriented design and appropriately use
Goals: provided data structures.
Collaboration: For this assignment, you will submit work in partners assigned in class. While
you can work with your partner any way you wish, I encourage you to use a
pair-programming approach while working on Class 1 in particular, whereas
Classes 2 and 3 can be worked on independently.
Submission : Turnin Form
Assignment
In this assignment, you will be working on information compression using a Huffman Tree. It is a
large assignment, so start with your partner early. You can choose to either work on it in a pair-
programming approach or by dividing some of the interfaces described below, implementing them
independently, and integrating them. While test-driven development for individual classes is
recommended, submission of test cases is not required. Bare-bones starter code is provided here.
At this point, you should be familiar with the core standard library data structures such as strings,
   vectors, lists, stacks, queues, sets, and maps. Use them judiciously in your assignment!
   Additionally, you are provided with utility classes for bitstreams, objects that work like regular
   streams but read and write individual bits instead of multi-bit or multi-character data elements.
   Note that while the interfaces below are required, you may also write additional internal methods if
   it helps you organize your code. Do make sure that different classes and declared in different
   header files, implemented in different implementation files, and added to the Makefile for
   compilation.
   Class 1: The decision tree (40%)
   In a prior assignment, you implemented a special-purpose decision tree where internal and
   external nodes each held different kinds of information. In this assignment, you will create a new
   decision tree class that stores different information. The Huffman decision tree is a binary tree
   where every node has a “weight” property describing the cumulative frequencies of everything in
   that subtree, and leaves additionally have a “value” property that describes the character
   represented by that leaf. A description of the Huffman tree construction algorithm is in section 12.4
   of our textbook. You are strongly encouraged to review the previous decision tree assignment to
   make this component easier.
   The Huffman tree class should implement the following interfaces:
       http://cs.whitman.edu/~strattja/cs/270/2018S/hw/fp.huffman.shtml
       1/35/5/2018
       CS 270, 2018S: Final Project
       HuffmanTree(const map<char, float>&): construct a Huffman Tree from a set of characters
                                             with frequencies. Your constructor should use the algorithms from the textbook and in class
                                             to build a tree by merging subtrees together. Remember, a priority queue of Nodes is going
                                             to be very helpful here!
                                             void writeTree(ostream&): write the Huffman tree as text to a stream. The encoding format
                                                                       for the Huffman decision tree is similar to the question-based decision tree. Print one node
                                                                       per line: if the node is an internal node, begin the line with “NODE:”, followed by the
                                                                       frequency of the characters in that subtree. If a node is a leaf, begin the line with “LEAF:”,
                                                                       followed by the character, followed by the frequency of that character. A simple tree might
                                                                       look like:
                                                                       NODE: 1.0
                                                                       LEAF: a 0.5
                                                                       LEAF: b 0.5
                                                                       HuffmanTree(istream&): construct a Huffman Tree from an already-open stream, according to
    the same tree format described above. When done, the constructor should use the seekg
                                                                                         method to reset the istream to the beginning of its content (like closing and reopening the file,
                                                                                                 without actually closing and reopening it.)
                                                                                              char decode(ibitstream&): beginning at the top of Huffman tree, decode a single character
                                                                                                                        from a stream of bits. Use the ibitstream member function “readBit” to read individual bits.
                                                                                                                        Traverse the tree, and return the decoded character. You may want to revisit the traversal
                                                                                                                        code of the previous Decision Tree data structure as a refresher.
                                                                                                                        map<char, std::vector
                                                                                                                                                                    a map from each character represented in the Huffman Tree to the bit coding that would be
                                                                                                                                                                    required to reach that character in the tree. (Hint: think about using a vector as a stack.
                                                                                                                                                                            Keep track of what choices you have made in your traversal by pushing items on to the back
                                                                                                                                                                            of the vector, and when you reach a leaf, the vector will contain the sequence of choices
                                                                                                                                                                            made to reach that leaf. Add that vector to the map, and when you return from a recursion
                                                                                                                                                                            step in the traversal, pop the last decision-item off the end of the stack. All of the entries in
                                                                                                                                                                            the vector should be either 0 or 1.)
                                                                                                                                                                    To test your HuffmanTree separately from an encoder, decoder, you’ll have to manually construct
                                                                                                                                                                    the map and stream arguments in your test cases. Recall that cout can be used as an ostream,
                                                                                                                                                                    cin as an istream. Bitstreams are a little different: take a look at the ibitstreamtest.cpp file to see
                                                                                                                                                                    an example of how to create and use stringbitstream objects that might help you in your test cases.
                                                                                                                                                                    Class 2: The Huffman Encoder (25%)
                                                                                                                                                                    This class will have a HuffmanTree and an istream& as fields, and will use it to manage the
                                                                                                                                                                    encoding and decoding of files through the following interfaces
                                                                                                                                                                    HuffmanEncoder(istream&): create an encoder and construct the HuffmanTree for the given
                                                                                                                                                                                              istream. When the construction is completed, reset the istream to the beginning of its input
                                                                                                                                                                                              using the seekg function with the std::ios::beg position marker, e.g.
                                                                                                                                                                                              “mystream.seekg(0,std::ios::beg);”
                                                                                                                                                                                              int writeEncodedText(obitstream&): write the original text in encoded format to the given,
                                                                                                                                                                                              open obitstream, and return the number of characters encoded. Remember to reset the
                                                                                                                                                                                                                                 istream position at the end of the function.
                                                                                                                                                                                                                                 void writeTree(ostream&): invokes the writeTree function of the HuffmanTree to write the tree
                                                                                                                                                                                                                                                           used to encode this istream to the given output stream.
                                                                                                                                                                                                                                                           Class 3: The Huffman Decoder (25%)
                                                                                                                                                                                                                                                           http://cs.whitman.edu/~strattja/cs/270/2018S/hw/fp.huffman.shtml
                                                                                                                                                                                                                                                               2/35/5/2018
                                                                                                                                                                                                                                                               CS 270, 2018S: Final Project
                                                                                                                                                                                                                                                               This class will have a HuffmanTree as a field, and will use it to manage the decoding of a
                                                                                                                                                                                                                                                               compressed stream as produced by the encoder.
                                                                                                                                                                                                                                                               HuffmanDecoder(istream&): create a decoder and construct the HuffmanTree from the given
                                                                                                                                                                                                                                                                                         istream, which stores the tree in text format (as produced by writeTree).
                                                                                                                                                                                                                                                                                         void decodeText(int, ibitstream&, ostream&): decode the message from the given input bit
                                                                                                                                                                                                                                                                                                                                      stream into the output character stream. The first argument indicates the expected number
                                                                                                                                                                                                                                                                                                                                      of characters to be decoded.
    void writeTree(ostream&): invokes the writeTree function of the HuffmanTree to write the
                              tree. (This is primarily for debugging purposes, most applications will not need it.)
                              Above and Beyond: User application (10%)
                                                To complete the assignment, you must design and write some command-line user interface to
                                                enable a user to encode and decode text with the classes you have built. As some suggestions to
                                                get you started thinking, you could asking the user to provide names for files to read and write
                                                through prompts or the command-line, or asking the user to type strings and print the huffman-
                                                encoded results as ‘1’ and ‘0’ characters to the terminal, or write multiple different applications with
                                                different main functions for handling encoding and decoding using the same libraries. Whichever
                                                route you choose, start by looking up the documentation for the kind of streams you want to deal
                                                with - files are handled through ofstreams and ifstreams (or ofbitstreams and ifbitstreams) and
                                                strings can be processed using stringstream objects. Good links for the Stanford bitstream
                                                libraries here, and the C++ standard i/o library here. Talk with your partner to decide how you want
                                                your application to be informative, artistic, utilitarian, fun, or any or all of the above. Whatever
                                                interface you design and choose, include a README.txt file that describes what you did and how
                                                to use your application.
                                                Submitting your work
                                                Make sure your files are named correctly as specified above. In your comments at the top, make
                                                sure you:
                                                acknowledge any help or outside sources, in accordance with the academic honesty policy;
                                                describe any known bugs in the program.
                                                Rezip your materials using the command “zip -r mywhitmanID_Huffman.zip FP_huffman” in the
                                                folder containing the final project unzipped package. Submit your completed package using the
                                                online turnin form. Be sure to select the Final Project.