Consider the data collected by a hypothetical video store for 50 regular customers. This data consists of a table which, for each customer, records the following attributes: Gender, Income, Age, Rentals (total number of video rentals in the past year), Avg. per visit (average number of video rentals per visit during the past year), Incidentals (whether the customer tends to buy incidental items such as refreshments when renting a video), and Genre (the customer's preferred movie genre). This data is available as an Excel spreadsheet.
Perform each of the following data preparation tasks:
a. Use smoothing by bin means to smooth the values of the Age attribute. Use a bin depth of 4.
b. Use min-max normalization to transform the values of the Income attribute onto the range [0.0-1.0].
c. Use z-score normalization to standardize the values of the Rentals attribute.
d. Discretize the (original) Income attribute based on the following categories: High = 60K+; Mid = 25K-59K; Low = less than $25K.
e. Convert the original data (not the results of parts a-d) into the standard spreadsheet format (note that this requires that you create, for every categorical attribute, additional attributes corresponding to values of that categorical attribute; numerical attributes in the original data remain unchanged).
f. Using the standardized data set (from part e), perform basic correlation analysis among the attributes. Discuss your results by indicating any strong correlations (positive or negative) among pairs of attributes. You need to construct a complete Correlation Matrix (Please read the document Basic Correlation Analysis for more detail and an example). Can you observe any "significant" patterns among groups of two or more variables? Explain.
g. Perform a cross-tabulation of the two "gender" variables versus the three "genre" variables. Show this as a 2 x 3 table with entries representing the total counts. Then, use a graph or chart that provides the best visualization of the relationships between these sets of variables. See Slide 41 in Lecture 2 for an example. Also review Chapter 4 of Berry and Linoff. Can you draw any significant conclusions?
h. Select all "good" customers with a high value for the Rentals attribute ( a "good customer is defined as one with a Rentals value of greater than or equal to 30). Then, create a summary (e.g., using means, medians, and/or other statistics) of the selected data with respect to all other attributes. Can you observe any significant patterns that characterize this segment of customers? Explain. Note: to know whether your observed patterns in the target group are significant, you need to compare them with the general population using the same metrics.
i. Suppose that because of the high profit margin, the store would like to increase the sales of incidentals. Based on your observations in previous parts discuss how this could be accomplished (e.g., should customers with specific characteristics be targeted? Should certain types of movies be preferred? Etc.). Explain your answer based on your analysis of the data.
j. Use WEKA to perform the following tasks on the original data set (use the Comma Separated version of the above data set: Video_Store.csv). Load the data into WEKA Explorer (the Preprocessing module). Remove the Customer ID attribute. Review basic statistics for different attributes by clicking on the name of each one in "attribute" panel. Next, use the unsupervised attribute "Discretize" filter to discretize the Age attribute. Finally, use the unsupervised attribute "Normalize" filter to convert all of the remaining numerical attribute into [0,1] scale. Save the resulting data set into an ARFF formatted file and submit with your answers for the above questions.
Note: You can give the final results of parts (a) through (d) as a single table which includes the original data and has an added column for each of the parts (a) through (d). The results of part (e) should be a separate table. For the correlation analysis (part f) give your correlation matrix (rows and columns of the matrix are the attributes, and entries would represent correlation value for a pair of attributes (e.g., "Income" versus "Age"). Your analyses for various parts can be added to the same spreadsheet file, or it could be included in another document (e.g., an MS Word file). Please create a single ZIP archive for all your documents and submit via Facebook.
Considerate video copia ad notitia collecta, in hypothesi regularis teloneariorum L. Hoc notitia est mensa quae cuique mos, haec, scribit: Gender, deducuntur, Age, Pensio (video pensio numerus anno praeterito), Avg. per visit (average numerus video rentals per visit bis priore anno), appendices (an emptorem ad emendum non tendit per accidens, quando refectiones items talis ut a conductione video), et Genre (mos est scriptor genus movie praefertur). Hoc notitia ut est available an Praecedo spreadsheet. Operamur unumquodque de praeparatione Post Negotium data: a. Use teres per bin est ad valores Age aequabis attributum. Utor a bin altitudinem IV. b. Uti min-ordinationem ad max valores in transmutare deducuntur attributum onto range [0.0-1.0]. c. Utor vexillum valores ipsius z-ustulo ordinationem ad attributum Rentals. d. Discretize est (originale) Reditus attributum fundatur in sequenti praedicamenta: High 60k = +; Medie-59K 25K =; Maximum minus quam $ = 25K. e. Converte data originali (non autem ad partium results) in vexillum spreadsheet format (nota, quod ad hoc requiritur quod creet, cum omne attributum praedicamentalis, additional attributa valoribus categoricam attributum numeralis notitia attributa manet in the original) . f. Usura standardized data set (ex parte e), perfice pro basic analysis reciproci inter ipsa attributa. Causam tuam results a quolibet fortis correlations (positive vel negative) inter paria omnia. Vos postulo ut aedificare completum Matrix correlationis (Nova Lege scriptum Correlation Analysis for fusius et in exemplum). Servare potes "significant" duarum pluriumve variabilium exemplaria inter convivia? Expone. g. Fungi crucis-tabulation duorum "genus" tres variabiles versus est "genus" variabilium. Show II x hoc, quod a mensa in III entries repraesentans numerus formaliter. Tum uti quae sit optima visio lacinia purus vel chart variabilium relationem inter haec duo. Lectio II See Slide XLI in exemplum. Item Chapter IV review of Berry et Linoff. Potes trahere aliquem Significant Conclusiones? h. Lectus omnis "boni" et magni dolor Pensio attributum («bonus dicitur ille mos est maior aut aequalis valoris Pensio XXX). Deinde creare summa (ut mediis, Medorum, et / aut alia statistica) illarum data ad omnia alia. descriptam habes exempla signanter observare mos segmentum aperi. nota exemplaria in scopum coetus custodierunt cognoscere quae vestra sunt, vos postulo comparare cum eodem communi hominum Horace. i. Quod si propter magnificentiam utilitatem margine esset similis crescere in Sales appendices copia. Ex praecedentibus tuam tracta animadversiones quomodo posset (ut mos shouldnt cum minationibus esse targeted? Numquid Certain typus of movies praeferenda? etc.). explica responsum fundatur in analysis de notitia. j. Uti melanotus ad facere sequenti Negotium data set on the original (uti vero separatae version of Comma posuit supra notitia: Video_Store.csv). Onerantes notitia in Rimor melanotus (in Preprocessing module). Aufer Customer ID attributum. Tessera per diversa attributa librorum praecipue Review unusquisque in nomine "attributorum" panel. Denique usus est attributum INCUSTODITUS "Discretize" Age de discretize filter ad attributum. Postremo, inveni in INCUSTODITUS attributum "normalize" filter convertet ad reliquorum omnium attributo numero [0,1] buit. Salvum Ex data set in ARFF formatted file et subiacete cum responsio vestra pro praedictis quaestionibus. Nota: potest dare ultimum consequitur de partibus (a) thru (d) quod a data, et addita una mensa qua includit originali columnae singulae partes (a) per (d). Cum ex parte (e) separata sit mensa. Nam influxus reciproci analysis (f pars) darent matrix correlationis (et ordines columns of matricis sunt attributae, et entries for valorem par attributorum repraesentet correlationis (eg, "facilis" versus "Age"). Vestri explicationes varias idem lima Spreadsheet partium addi, vel esse possunt in alio (vg si lima MS Verbo.) Placere Unidiff creare unum in omnibus mandatis et subiacete via Facebook.
การแปล กรุณารอสักครู่..
