Association Mining and Trend Prediction

430 ASSIGNMENT2 CAB430 Data and Information Integration Semester 1, 2018 Assignment 2 Association Mining and Trend Prediction Due date: Wednesday, 30 May 2018 (week 13), 11:59pm Weighting: 30% of the assessment for CAB430. Team work: You may work on this assignment individually or in pairs. In

c/c++代写,java代写,python代写,matlab代写,作业代写,留学生作业代写

CAB430 ASSIGNMENT2 CAB430 Data and Information Integration Semester 1, 2018 Assignment 2 Association Mining and Trend Prediction Due date: Wednesday, 30 May 2018 (week 13), 11:59pm Weighting: 30% of the assessment for CAB430. Team work: You may work on this assignment individually or in pairs. In the latter case, each team member should submit an identical submission. One of the submissions will be marked. Required to be submitted: One single zip file which contains the following files: 1. A statement of completeness, including any assumptions that you made for this assignment. 2. A report (pdf file) containing your answers to both tasks. For your DMX statements, appropriate comments should be included to explain what each statement does. 3. A separate .dmx file containing all your DMX statements or queries (i.e., mining structures, mining models, processing, and predictions). You can use ‘GO’ to separate the statements. Your statements are supposed to be executable and working well. You have two tasks to complete in this assignment. Your first task is to generate frequent patterns and association rules from a given transaction dataset by correctly utilizing the Apriori algorithm. In the second task, you are required to design data mining structures and models by using the Microsoft Data Mining Extension (DMX) language to make predictions. Task 1: Patterns and Rules (5 marks) In this task, you are required to generate frequent patterns and rules from the transaction dataset given in Table 1. Table 1 A transaction dataset Transaction ID Items 1 a, c, d, e, f 2 a, b, e 3 c, e, f 4 a, c, d, f 5 c, e, f 2 2 1) Frequent itemsets (3 marks) Your first task is to generate frequent itemsets from the dataset given in Table 1 by using the Apriori algorithm assuming that the minimum support is set to 2/5 or 0.4. You need to provide your working process of generating the frequent itemsets. You can refer to week 8 lecture note to describe your working process. Your answer should include the result of each dataset scan and the final result including the frequent itemsets and corresponding support for each frequent itemset. 2) Association rules (2 marks) Still using the Apriori algorithm to generate 10 association rules from the frequent itemsets generated in Question 1) of this task, including rules whose antecedent contains one, two, or three items. Both the support and confidence of each rule should be larger than or equal to 2/5 or 0.4. Task 2: Data mining model for trend prediction (23 marks) In your first assignment, you have designed a data warehouse for your company XYZ. Assume that from the data collected in the database, we have created another database called XYZRental_DB which consists of three tables, RentalOrder, XYZ_Customers, and XYZ_Cars. Customers’ demographic information is provided in the table XYZ_Customers. The company’s previous rental information is provided in the table RentalOrder, and the car information of this company is in table XYZ_Cars. Each rental order in RentalOrder provides the ID of the car that a customer has rented and the customer’s ID, the name and location of the store from which the customer rented the car, and also the year and month when the order was made. For simplicity, this table does not provide detailed information about where the customer picks up or returns the cars, For the questions in this assignment, you may not need all the information provided in the database. You can choose the information that you think necessary for generating the required predictions. A backup file of this database, XYZRental_DB.bak, is provided along with this specification on Blackboard. You are required to design data mining models using the DMX query language in Microsoft SQL Server to generate the following 5 predictions.  Car predictions: predicting what cars (i.e., car makes and car models) that customers may like to rent based on the input data for each of the following two predictions: Prediction [1]: The input data includes customers’ demographic information (e.g., customers’ age, gender, occupation, etc.). For this prediction, it is supposed that you do not know customers’ previous rental information. Therefore, customers’ previous car rental information (e.g., car makes, car models, and rental stores) CANNOT be used as input. Prediction [2]: The input data includes both customers’ demographic information and previous car rental information.  Customer attribute predictions: predicting customers’ demographic attributes based on the input data for each of the following two predictions: Prediction [3]: The input data includes customers’ demographic information and previous car rental information as well. For example, to predict a customer’s occupation based on his/her age and gender, and previously rented cars (e.g., car makes and models). 3 3 Prediction [4]: The input data includes customers’ previous rental information ONLY. It is supposed that you don’t know customers’ demographic information. So, customers’ demographic data CANNOT be used as input for this prediction.  Store predictions: predicting which rental stores (i.e., store name and location) are preferred by customers based on the input data specified below. Prediction [5]: The input data includes customers’ demographic data and previously rented cars ONLY. But you do not know the customer’s previously used store information. So, stores (i.e., store names and locations) CANNOT be used as input. To generate these predictions, you need to complete the following tasks. 1) Designing Mining Structures (4 marks) You can design one or more mining structures. For each of your mining structures, briefly describe what this mining structure defines and which prediction(s) that the mining structure can be used for. 2) Designing Mining models (6 marks) For each of your mining structures, you can add one or more mining models.  All your mining models should use the Microsoft_Association_Rules algorithm to make predictions.  For each of your mining models, state which prediction(s) it will be used for.  You should make sure that users of your mining models can observe the cases and content of the models once they are processed. 3) Processing your mining structures and mining models (4 marks)  Once the mining structures and mining models have been designed, you need to design appropriate DMX statements to process the mining structures and the mining models in order to generate useful patterns and association rules. The database XYZRental_DB can be used as the source data to process your mining structures and mining models.  For each of your mining models, browse its content and provide a screenshot of part of the association rules generated by the mining model. 4) Prediction queries (9 marks) You are required to design prediction queries to generate the 6 predictions. The output of the predictions should be easy to understand. For example, the output of Question 4_1 of week 9 practical, as illustrated below in Figure 1, provides meaningful headings for each column such as “Predicted_Products”. These meaningful headings make the result easy to understand. In Figure 1 below, the output includes three recommendations to product model. For each of your predictions, you can return one recommendation or multiple recommendations. 4 4 Figure 1 An example output  Car prediction o Singleton query for Prediction [1] Design one singleton query with a specific input case to predict what kind of cars that a customer may like to rent given the customer’s demographic data. o Prediction against test cases for Prediction [2] Design one batch query against the cases in the test dataset to predict what cars that each customer in the test dataset may like to rent given the customer’s demographic data and the customer’s previous rental information.  Customer attribute prediction o Singleton query for Prediction [3] Design two singleton queries each with a specific input case to predict a customer’s demographic attribute given the customer’s other demographic data and/or previous rental information as well. o Prediction against cases in external database for Prediction [4] Design one batch query against the cases in the database XYZRental_DB to predict customers’ demographic attributes given the customers’ previous car rental information ONLY.  Store prediction o Prediction against cases in external database for Prediction [5] Design one batch query against the cases in the database XYZRental_DB to predict the store name and location that a customer would choose given the customer’s demographic data and some previously rented cars ONLY. Please Note  Your report and the.dmx file should be well laid out, easy to read and well commented where necessary.  All items submitted should be clearly labelled with your name and student number.  Marks will be awarded to reports (correctness, clarity) and DMX queries (correctness, executable, well formatted, properly commented).  You will lose marks for missing or inaccurate statements of completeness. 5 5  Your work to the second task will be assessed based on your answer to the questions (i.e. the correctness of your DMX statements and queries), it will not be assessed based on the prediction results produced by your models using the provided database. — END OF ASSIGNMENT 2 — 6 6 Marking Sheet Student(s) Name ID Number Total Marks /30 Task 1: Patterns and Rules Comments Marks 1) Frequent itemsets /3 2) Association rules /2 Sub Total /5 Task 2: Data mining model for trend analysis Comments Marks 1) Mining structure /4 2) Mining models /6 3) Processing mining structure and models /4 Predictions /9 Sub Total /23 General Comments Marks (1) Report presentation /1 (2) The statement of completeness is accurate. /1 Sub-total mark — End of Marking Sheet — /2

留学生作业代写,cs作业代写,cs代写,作业代写,北美cs作业代写,澳洲cs作业代写,加拿大cs作业代写,cs作业代写价格,靠谱cs作业代写,程序代写
WeChat