Real-World Python Machine Learning Tutorial w/ Scikit Learn (sklearn basics, NLP, classifiers, etc)
YouTube Viewers YouTube Viewers
217K subscribers
234,433 views
0

 Published On Sep 30, 2019

Practice your Python Pandas data science skills with problems on StrataScratch!
https://stratascratch.com/?via=keith

In this video we walk through a real world python machine learning project using the sci-kit learn library. In it we work our way to building a model that automatically classifies text as either having a positive or negative sentiment. We do this by using amazon reviews as our training data. Full video timeline in the comments!

Link to Code & Data:
https://github.com/keithgalli/sklearn

Raw Data download:
http://jmcauley.ucsd.edu/data/amazon/

Sci-kit learn documentation:
https://scikit-learn.org/stable/docum...

Make sure you have sci-kit learn downloaded! To do this either run "pip install sklearn" or use python through Anaconda.

Join the Python Army to get access to perks!
YouTube -    / @keithgalli  
Patreon -   / keithgalli  

---------------------------
Follow me on social media!
Instagram:   / keithgalli  
Twitter:   / keithgalli  

To get one of the cool shirts I was wearing:
  / pagandvls  

---------------------------

Video outline!
0:00 - What we will be doing!
3:40 - Sci-Kit Learn Overview
6:38 - How do we find training data?
9:33 - Download data
11:45 - Load our data into Jupyter Notebook
16:38 - Cleaning our code a bit (building data class)
20:13 - Using Enums
22:50 - Converting text to numerical vectors, bag of words (BOW) explanation
25:45 - Training/Test Split (make sure to "pip install sklearn" !)
33:45 - Bag of words in sklearn (CountVectorizer)
40:05 - fit_transform, fit, transform methods
42:05 - Model Selection (SVM, Decision Tree, Naive Bayes, Logistic Regression) & Classification
47:50 - predict method
53:35 - Analysis & Evaluation (using clf.score() method)
56:58 - F1 score
1:01:01 - Improving our model (evenly distributing positive & negative examples and loading in more data)
1:20:36 - Let's see our model in action! (qualitative testing)
1:22:24 - Tfidf Vectorizer
1:25:40 - GridSearchCv to automatically find the best parameters
1:31:30 - Further NLP improvement opportunities
1:32:50 - Saving our model (Pickle) and reloading it later
1:36:37 - Category Classifier
1:39:14 - Confusion Matrix

---------------------
If you are curious to learn how I make my tutorials, check out this video:    • How to Make a High Quality Tutorial V...  

*I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.

show more

Share/Embed