First things first,

  • A list of background music. Link.

  • Sketching Link

Data Science

  • Top 10 DS courses # http://bigdata-madesimple.com/review-of-top-10-online-data-science-courses/

  • https://www.kaggle.com/wiki/Tutorials

  • https://github.com/ujjwalkarn/Machine-Learning-Tutorials

  • http://www.kdnuggets.com/2015/06/top-20-python-machine-learning-open-source-projects.html

  • http://brettromero.com/wordpress/data-science-a-kaggle-walkthrough-introduction/ Kaggle

  • https://github.com/jmschrei/pomegranate Pomegrante

Git

Python

Statistics

Quick Short Cuts

Ipython Notes for learning

Lots of quick & interesting slides

Data Scientist Workbench:

It’s a free all-in-one solution for people interested in performing data analysis. The Data Scientist Workbench includes:

  • OpenRefine to clean up messy data.

  • Jupyter notebooks supporting Python, R, and Scala (with access to Apache Spark for Big Data processing).

  • Apache Zeppelin notebooks.

  • RStudio in your browser.

https://my.datascientistworkbench.com/

QuickSlides on NLTP - Natural Language Text Processing

  • https://www.cse.iitb.ac.in/~neelamadhavg09/docs/dependency_parsing.pdf # Articles on semantic text-parsing - dependency parsing.

Kaggle Tips:

Part 1 of this blog post series: Orientation

Part 2b: Ranking and regression metrics

Part 3: Validation and offline testing

Part 4: Hyperparameter tuning

Part 5: A/B testing

Tom Fawcett’s 2006 Pattern Recognition Letters paper on An Introduction to ROC Analysis.

Chapter 7 of Data Science for Business discusses the use of Expected Value as a useful classification metric, especially in cases of skewed data sets.

Research Articles

Note: This post was updated on April 16, 2015. Thanks to @aatallah for demystifying the origin of the name “ROC curve,” and to Joe McCarthy for the helpful references.

PDF/Slides Generator for presentations