Data Mining

MA 384    Data Mining

Course Prerequisites

  • MA 384   Prerequisite: CSSE 120 Intro to Software Development or equivalent course.
  • CSSE 386 Prerequisite: CSSE 220 Object-Oriented Software Development
  • MA 223 Engineering Statistics I or MA 381 Intro to Probability.

Course Topics

  • Data Exploration, Preparation and Visualization
  • Classification Analysis
  • Cluster Analysis
  • Elementary Natural Language Processing
  • Student Project on an Advanced Topic
  • Note: CSSE 386 has more advanced programming assignments.

Course Description

An introduction to data mining for large data sets, include data preparation, exploration, aggregation/reduction, and visualization. Elementary methods for classification analysis, cluster analysis, and natural language processing will be covered. Significant attention will be given to presenting and reporting data mining results.

Course Textbook: none

Three V’s of Big Data

Example (Volume)
Visualizing Friendships (Facebook)

Visualizing data is like photography, Instead of starting with a blank canvas, you manipulate the lens used to present the data from a certain angle.  Paul Butler, Facebook Intern

Facebook’s social network graph is large enough to reproduce a rough map of the world.

VisualizingFriendships_large

 

Example (Velocity):
Clicks in the United States in June 2011

YouTube animation below created by Helmut Hissen. Green represents clicks from mobile devices and red represents clicks from non-mobile devices.

Example (Variety)
City of Chicago Data Portal

Cities are releasing a wide variety of data to the public.

ChicagoDataPortal

Past Student Projects

  1. A Technical Analysis of Stock Trades Made by Members of Congress, Ben Adler, Thomas Bioren, Matthew Hart, Jackson Heil
  2. Predicting S&P 500 Movement Using Financial Asset Prices, Austin Vesich, Ian irschenbaum, Jake Grellman
  3. Chess Analytics, Avichal Jadeja, Chris Lardner
  4. Assessing Improvement in Collegiate Swimmers: Key Factors from High School to College, Tommaso Calviello, Emre Gunay, Vineet Ranade, Preksha Sarda, Blaise Swartwood
  5. Detecting Illicit Bitcoin Transactions, Graph Data Mining for Anti-Money Laundering,  Ian Lemons, Aiden O’Neil, Ethan Pabbathi, Rhys Phelps
  6. Predicting Movie Success: A Data-Driven Approach, Simarjit Dhillon, Ben Joens, Justin O’Donnell, Kyle Wang
  7. Football Sports Betting Analysis, Manav Ahuja, Matthew Briscoe, Colin Decker, Ethan Huey
  8. Possible Factors Explaining the California-Texas Migration, Dylan McCain, Ryan Seidel, Larsen Morehouse, and Mark Worden
  9. What Factors Determine Movie Profitability? Aidan Frantz, Ethan Hutton, Devin Mehringer, Wesley Schuh
  10. Credit Card Approval and Loyalty Prediction Analysis, James Koh, Kevin Lin, Joshua Lowe, and Aditya Senthilvel (pdf)
  11. Cardiovascular Disease Assessment and Prediction, Mitch Boucher, Ariadna Duvall, Aiko Sherman, Manuella Shomba
  12. Historical Composition of the United States Senate and House of Representatives, Eric Bender, Evan Chung, Anthony Mui, Rahul Siripuram
  13. Behind the Goals: A Data-Driven Approach to the FIFA World Cup, Brian Beasley, Matteo Calviello, Caleb Mosteller, David Utsis (pdf)
  14. Analyzing NBA Game Data: Unraveling the Key Metrics for Victory, Wil Bell, Michael Trinh, Gurinder Vasanta
  15. Relationship Between San Francisco Crime and Housing, Swade Cirata, Marcus Henderson, Aidan Matthews, Chaitanya Singh
  16. What Makes a Hit Song? Chuwei Du, Josh Norris, Drew Kilner
  17. Predicting the Financial Stability of Banks, Daniel Gaull, Hanshuo Geng, William Hawkins, Muyao Zhong
  18. Academic Test Scores in NCAA Sports, Ethan Brown, Lucas Czarnecki, Grant Ripperda, Liam Waterbury, Harrison Wight
  19. Racial Demographics of Superfund Sites, Riya Bharamaraddi, Mike Bryant, Luke Ferderer, Jayden Foshee, Collin Morris
  20. Statistical Characteristics of Big Five Personality Traits, Olivia Davis, Nat Hurtig, Dalton Julian, Andrew Kosikowski, Andrew Orians
  21. Soccer Player Analysis, Qijun Jiang, Xianshun Jiang, Yuanyu Wang, Yunzhe Wei, Yujie Zhang
  22. Analysis of Trends in Steam Reviews, Ian Liu, Hunter Masur, Raf Qian, Simon Tian, Thomas Yang
  23. Analysis of Discord Conversations, Nathan Chen, Spencer Chubb, Emily Hart, Matthew Ragland
  24. Factors of the Median Income of Graduates by College, Jordan Ansari, Brock Buczkowski, Cade Parkhurst, Andrew Pascente, Adithya Ramji
  25. LA Rams: Analyzing the NFL’s best, Sangheon Choi, Kush Bhuwalka, Ken Zheng, Samvit Ram
  26. On Implementation of a Flat Tax Rate on Individual Income Tax in the United States, Luke McMahon, Josh Mestemacher, Evan Sellers, Michael Yager
  27. Housing Prices of Housing Types Across Regions in the United States, Jadon Brutcher, Adam Korinek, Avery Wagner, Grant Wyness
  28. Pandemic Crime Changes, Andre Battle, Nick Bohner, Aidan Mazany, Jake Wallis
  29. Youtube Dislikes, Rob Budak, Luke Cesario, Jonathan Moyers, Azzam Turkistani
  30. Covid19 Vaccination Adverse Reactions, Bowen Ding, Ao Liu, Nigel Nie
  31. Lichess, Tom Ahmed, Griffin Annis, Landon Bundy, Jackson Hajer, Christian Meinzen, Nick Von Bulow
  32. Toxic Comment Classification, Shannon Jin, Dylan Luttrell,  Vidhu Naik, Connie Zhu
  33. Kickstarter Project, Joey Hatfield, Zach Kelly, Zackery Painter, Nick Pisciotta, Ried Tate
  34. Olympics Through the Ages, Max Chaplin, Abi Clayton, Bowen Lie, Ainsley Liu, Jake Milanowski, William Thesken
  35. Characteristics of Successful Movies, Shengjun Guan, Alex Ketcham, Aaryan Khatri, Andrea Wynn, Sean Xia, Will Yelton
  36. Premier League Analysis 1920, Yutong Chen, Mashengjun Li, Max Li, Jiadi Want, Travis Zheng
  37. Solar Panels, David Alba-Lopez, Jeremiah Wooten, Rachel Harness, Mory Chen
  38. DSL Modem Data Analysis, T.J. Ballard, Sybil Chen, Piotr Galas, Wendy Ju, Kristen McKellar
  39. U.S. Stock Exploration, Sam Dunaway, Aaron Glave
  40. Movie Data, Mohammed Ali, Derek Grayless, David Gruninger, Jiafan Lin, Caleb Schlundt
  41. Covid-19 Data Analysis, Tiantian Zhang, Ben Feaster, Howard Hu, Yu Xin Evian Wen
  42. Repeat Buyer Prediction, Augustine Cui, Doris Chen, Scott Sun, Wenxing Li, Xiangnan Chen
  43. Panopto Video Statistics, Jessica Myers, Katana Colledge, Brionna Slaughter, Jake Meister
  44. Netflix Digital Contents, Robin Li, Zijian Huang, Valerie Liu, Aurora Ouyang, Susie Seo, Siwei Xu
  45. Energy Production and Usage, Steven Feng, Lawrence Ko, Wenze Ma, Shiloh Musser, Darren Zhu
  46. Sports Data, Samuel Flickinger, Eric Kirby, Arjun Mahajan, Jared Petrisko, Anthony Schmidt
  47. Factors of Graduate School Admissions, Runzhe Gao, Frank Hu, Weite Li, Song Luo, Max Wang
  48. UFO Sightings, Alexander Boffo, Aditya Burle, Benjamin Goldstein, Michael Lake, Cehong Wang
  49. Google Books Ngrams, Ben Hall, William Mason, Aaron Michael, Stella Park, Wyatt Shafer
  50. Trending YouTube Video Statistics, Hussein Alawami, Tyler Bath, Tyson Clark, Sylvia Nees, Indresh Srivastava
  51. Opiate Overdosing, Brevin Lacy, Matthew Lyons, Kathi Munoz-Hofman, Neelie Shah
  52. Kickstarters, Stephen Crowell, Timmy D’Avello, Michelle Reese, Nate Schwindt
  53. Analyzing Trends in the Stock Market, Alexander Bradshaw, JaeJung Hyun, Jacob Petrisko, Abilash Raghuram
  54. Measuring Economic Distress and Disparity, Khalad Alfayez, Omar Fayoumi, Megan Hawksworth, Addi Reynolds, Seiji Takagi
  55. New York City Restaurant Inspections, Xiaomei Bi, Jocelynn Cheesebourough, Cambron Johnson, Jing Lin, Olivia Penry
  56. Aviation Accidents, Sonia Lai, Yiyu Ma, Dylan Scheumann, Xuechen Xie, Willis Yang
  57. Trending YouTube Video Statistics, Eric Chen, Lory Wang, Yiyuan Wang, Valentine Wu, Huirou Zou
  58. Stock Pump and Dump, Manoj Kurapati, Joshua Palamuttam, Isaac Austin, Rithvik Subramanya
  59. RoseCareer, Luke Wukusick, Kaiyu Xie, Chelsey Yin, Fred Lin, Christopher Nurrenberg, Jonah Reel
  60. College Swimming Data, Mary Petersen, Adam Baker, David Saadatnezhadi, Johann Ryan
  61. Global Terrorism Data Mining, Wesley Turner, Michael Crowell, Zachary Taylor, Alexander Granowski, Tucker Osman
  62. Advanced Computer Simulated Conflict Data, Kevin Lewis, Charlie Hersherger, Logan Smith, Anthony Grueninger
  63. ROSECRET, Qiuyun Li, Curtis Wang, Jerry Zheng, Mory Chen, Lansi Wang, Yicong Xie
  64. MOBA Game Analysis, Xiangbei Chen, Yifei Chen, Jiahao Chi, Jizhon Hang, Peicheng Tang, Yilun Wu
  65. Analyzing Song Lyrics and Chord Progressions, Wyatt Smith, Anne Boxeth, Jarret Alexander, Kennedy Schnieders, Zachary Thelen
  66. Plant Geneology, Charlie Gettys, Jenna Wohlpart, Nick Harrelson, Ishan Saraf, Anirudh Singh
  67. Flight Status, Ramsey Tomasi-Carr, Akanksha Chattopadyay, Nihaal George, Wesley Siebenthaler, Shinjun Yu
  68. Analyze and Visualize Traffic Throughput, Benjamin Brubaker, Songuy Wang, Alexander Wong
  69. Analyze Flights Status and Predict Delay and Cancellation, Tiancong Zhao
  70. Hearthstone Card Graph, Tyler Rarick
  71. League of Legend Ranked Game Analysis, Fangyuan Wang
  72. Magic the Gathering Card Analysis, Dalton Bush, John Fenoglio, John Hamilton
  73. Measuring the Effect of Weather on Electricity Generation for Renewables, Marc Schmitt, Daniel Verlaque
  74. Professional Sports Signing Predictions Based Off NCAA Stats, Lucas Weier, Eric Haug, Alexander Meyers
  75. Stock Correlator, Ryan Crafts, Christopher Knight, Ethan Peterson
  76. Stock Market and Social Platforms, Dustin George
  77. UFO Report Analysis in Past 20 Years, Wenkang Dang, Donglai Guo
  78. Video Game Sales, Yunuan Ding, Yuanqi Li
  79. Amazon Stock Price Analysis, Alexa Pieragowski, Emelye Wu
  80. Chess Game Analysis, Kieran Groble, Lewis Kelley, David Lam
  81. Graphical Data Mining using Diiagramr Library Backend by Pandas, Christian Nunnally
  82. Modeling Cryptocurrencies to a Behavioral Economic Model, Joseph Porter, William York
  83. Movie Recommender, Joseph Brown, Ding Nie, Avery Pratt
  84. Optimization of Electricity Grid – Analysis of Resources, Distribution and Consumption of Electricity, Bryan Gish, Caleb Hille, Joseph Novosel
  85. Pokemon Project, Fengyi Huang, Ming Lyu, Junyi Xiao
  86. Professions Across the Country and The Cost of Working There, Leo Betts, Jaron Goodman
  87. Protein Visualization, Krystal Yang, Sam Zhang, Fred Zhang
  88. Topic Interest Correlation to Asset Prices, Adit Survarna, Jack Wassom
  89. TV Show Quotes Analysis, Lance Dinh, Maya Holeman
  90. Twin Cities Metro Area Data, Kiana Caston, Joshua Richey
  91. World Input-Output Database, Ty Adams, Mariana Lane
  92. Yelp Mining, Justin Willoughby
  93. League of Data, Dax Earl, Mason Schneider, Aaron Golliver, Tayler Burns and Mark Hein
  94. Classification of Protein Folds and Families, Jonathan Taylor, Alexis Fink, Devon Timaeus, Giuliana Watson
  95. Geographic Analysis of Movie Preferences, L.E. Davey, David Galvez, Matthew Mercer, Henrik Sohlberg
  96. Classification of Galaxies. Man Chi Huen, Si Fi Faye Li
  97. Handwritten Digits Classification, Brent Austgen, Matt Spurr, Jake Schuenke, Tyler Shelton
  98. Prediction of Flight Delay Times at SFO, Andy Chen, Zhengyu Qin, Ted Samore
  99. Organic Foods Impacts and Trends, Brandon Cox, Davis Robinson, Fang Huang
  100. March Mining Madness, Dan Schepers, Matt Skorina
  101. Where Rose Goes, Elias White
  102. Mining Twitter for Meaningful Sentiment, Alex Crowley
  103. Transforming How We Diagnose Heart Disease, Alvin Ye, Kyle Daruwalla
  104. Handwritten Digit Recognition, Adam Finer, James Gibson, Johnathon Hein
  105. Steam Marketplace Analysis, Jacob Knispel, Nithin Perumal, Alec Tiefenthal, Matt Buckner
  106. An Analysis by Income of the 2013 Community Data Set from Kaggle, Jake Laird, Abby Mann, Zachary Haloski
  107. Lyric Generator, Christopher Lambert, Graham Fuller
  108. Chilean Government Income Analysis, Deven Dong, Yuzong Gao, Fang-Yen Lee
  109. Comparing Countries over Time, Anne Leonhard
  110. Tornado Trends in the United States, Megan Liebman
  111. Where and When Teleport is a Better Summoner Spell than Ignite in League of Legends, Andrew Ma
  112. How Data Mining Can Help You Do Better in Hearthstone, Jerry Qiu, An Hu
  113. Are Stock Clusters Meaningful, Dylan Vener, Daniel Mikhail
  114. Unrevealed Relationships Between Resources, Ruinan Zhang, Wenjun Kong, Zhihao Xue, Jiaren Wu
  115. Correlation between Stock Price and Trading Volume, Xiao Xin