Million Song Dataset Analytics

PySpark, MLlib, Tensorflow, AWS

Instructors: Virginia Smith and Ameet Talwalkar

Course Website

Project Report

Project summary

  • Carried out feature engineering and pre-processing on AWS EC2 parallelly with PySpark
  • Modeled the relationship between artist familiarity and the features with various MLlib tools and Tensorflow
  • Predicted artist familiarity with the pipeline, visualized and analyzed the results for business insights
Xiangyu Yin
Xiangyu Yin

Postdoc @ ANL | AI4science, Physics4ML, scientific discovery acceleration & automation

Related