Handling Categorical Data in Machine Learning: Easy Explanation for Data Science Interviews
Emma Ding Emma Ding
52.9K subscribers
5,158 views
0

 Published On Dec 19, 2022

Handling categorical data in machine learning projects is a very common topic in data science interviews. In this video, I’ll cover the difference between treating a variable as a dummy variable vs. a non-dummy variable, how you can deal with categorical features when the number of levels is very large, and the pros and cons of various strategies.

Feature hashing
https://en.wikipedia.org/wiki/Feature...


🟢Get all my free data science interview resources
https://www.emmading.com/resources
🟡 Product Case Interview Cheatsheet https://www.emmading.com/product-case...
🟠 Statistics Interview Cheatsheet https://www.emmading.com/statistics-i...
🟣 Behavioral Interview Cheatsheet https://www.emmading.com/behavioral-i...
🔵 Data Science Resume Checklist https://www.emmading.com/data-science...

✅ We work with Experienced Data Scientists to help them land their next dream jobs. Apply now: https://www.emmading.com/coaching

// Comment
Got any questions? Something to add?
Write a comment below to chat.

// Let's connect on LinkedIn:
  / emmading001  

====================
Contents of this video:
====================
00:00 Introduction
00:48 Categorical Data
02:22 Ordinal Features & Class Labels
03:38 One-Hot Encoding
05:32 Dummy Encoding
06:30 Problems of One-Hot & Dummy Encoding
07:26 Feature Hashing

show more

Share/Embed