Programming Assignment 1 (Updated, Oct 1)

Programming Assignment 1 (Updated, Oct 1) Points: 500 Due: Oct 11, 11:59PM Late Submission Due: Oct 12, 11:59PM (25% penalty) Description of a programming assignment is not a linear narrative and may require multiple readings before things start to make sense. You are encouraged to start working on the assignment as soon as possible. If you start a day or two before the deadline, it is highly unlikely that you will be able to come with correct and efficient programs. You are encouraged to consult instructor/Teaching Assistants for any questions/clarifications regarding the assignment. Your programs must be in Java, preferably Java 8.1. You should not use any external libraries. Any libraries that you import must start with java. For this PA, you may work in teams of 2. It is your responsibility to find a team member. If you can not find a team member, then you must work on your own. Only one submission per team. In this programming assignment you will • Implement a priority Queue (Max Heap) to store Strings (along with their priorities). • Implement (variants of) BFS algorithm to crawl wiki pages and construct a partial wiki graph. You will design following classes: 1. PriorityQ 2. WikiCrawler All your classes must be in the default package (even though it is not a good programming practice). 1 PriorityQ Read Section 2.5 from the text to refresh your understandings about priority queues. A priority queue stores data items along with their priority (often called key of the item). If the data item is v and its priority is p, then the tuple hv, pi is store in the priority queue. For any such tuple t = hv, pi we say that key(t) = p and val(t) = v. In this PA, you will implement a priority queue using max heap, which is implemented as an array A, where the max heap property is that for every i (array index starts at 1), key(A[i]) ≥ key(A[2i]) and key(A[i]) ≥ key(A[2i + 1]) In this assignment, the data items are strings, where each string has an associated priority. 1 Your Objective. Implement a class PriorityQ with the following methods 1. PriorityQ(): constructs an empty priority queue. 2. add(String s, int p): Adds a string s with priority p to the priority queue. 3. returnMax(): returns a string whose priority is maximum. 4. extractMax(): returns a string whose priority is maximum and removes it from the priority queue. 5. remove(int i): removes the element from the priority queue whose array index is i. 6. decrementPriority(int i, into). Decrements the priority of the ith element by k. 7. priorityArray(): returns an array B with the following property: B[i] = key(A[i]) for all i in the array A used to implement the priority queue. 8. getKey(int i). Returns key(A[i]), where A is the array used to represent the priority queue 9. getValue(int i). Returns value(A[i]), where A is the array used to represent the priority queue 10. isEmpty(). Return true if and only if the queue is empty. The max heap property must be maintained after each of the above operation. Methods 3–7 have a pre-condition that priority queue is non-empty. 2 BFS and Web Graph In this problem, you will generate a exploration graph. Recall that given a graph, the BFS algorithm is as follows. 1. Input: Directed Graph G = (V, E) Root of exploration: root 2. Initilize a FIFO queue Q and a list Discovered 3. Add root to Q and Discovered 4. while Q is not empty do 5. Extract vertex v from the head of the Q 6. For every (v, v’) in E do 7. if v’ is not in Discovered then 8. add v’ to Q and Discovered If you output the vertices in the Discovered list, then that will produce the BFS traversal of the input graph starting from the root. We can the model web as a directed graph. Every web page is a vertex of the graph. We put a directed edge from a page p to a page q, if page p contains a link to page q. Given a root (we will call it seed URL of the web graph), we can explore this web graph using BFS algorithm as presented above. 2 2.1 Crawling and Constructing Web Graph One of the first tasks of a web search engine is to crawl the web to collect material from all web pages. The crawl may be performed in a BFS fashion. While doing this, the the search engine will also construct the web graph. The structure of this graph is critical in determining the pages that are relevant to a query. As part of this assignment you will write a program to do crawl of the web and construct web graph. However, web graph is too large, and you do not have computational resources to construct the entire web graph (unless you own a super computer). Thus your program will crawl and construct the web graph over 200 pages. Furthermore, Your program will crawl only Wiki pages. Relevance of a page. Before we proceed, let us discuss the notion of focused crawling. In certain application, you would like to visit only the web pages that are related to certain topic of interest. For example, if I would like to collect only the web pages that are related to the Iowa State University, then any page that is not about Iowa State University is not of interest to me. Furthermore, my goal is not collect every web page about Iowa State University. My goal would be to collect top, say 200, pages that are most relevant to Iowa State University. For this, we need a way to determine the relevancy of a page to the topic of my interest. Let T be a set of strings that describe a topic. For example, for my topic “Iowa State University”, the strings that describe this topic could be “Iowa State University, ISU, Cyclones, Ames, Atanasoff”. Given a web address a, let page(a) denote the contents of the page at address a. Given a web address a and a string s, let f(s, a) denote the number of times the string s appears in page(a). This is case sensitive. For a T, the relevancy of a web address a to the topic T is defined as”: Relevance(T, a) = X s∈T f(s, a) Your Objective. Implement a class WikiCrawler with the following methods. 1. The constructor WikiCrawler(String seed, int max, String[] topics, String output)} where (a) seed: related address of seed URL (within wiki domain) (b) max: maximum number of pages to consider (c) topics: array of strings representing keywords in a topic-list (d) output: string representing the filename where the web graph over discovered pages are written. 3 2. ArrayList extractLinks(string document): the method takes as input a document representing an entire HTML document. It returns a list of strings consisting of links from the document. You can assume that the document is HTML from some wiki page. The method must (a) extract only relative addresses of wiki links, i.e., only links that are of the form /wiki/XXXX (b) only extract links that appear after the first occurrence of the html tag

留学生作业代写,cs作业代写,cs代写,作业代写,北美cs作业代写,澳洲cs作业代写,加拿大cs作业代写,cs作业代写价格,靠谱cs作业代写,程序代写
WeChat