Million Song Dataset Analytics

PySpark, MLlib, Tensorflow, AWS

Last updated on Jan 3, 2022

Instructors: Virginia Smith and Ameet Talwalkar

Course Website

Project Report

Project summary

Carried out feature engineering and pre-processing on AWS EC2 parallelly with PySpark
Modeled the relationship between artist familiarity and the features with various MLlib tools and Tensorflow
Predicted artist familiarity with the pipeline, visualized and analyzed the results for business insights