Handling Categorical Data in Machine Learning: Easy Explanation for Data Science Interviews

52.9K subscribers

5,158 views

About
Share

Published On Dec 19, 2022

Handling categorical data in machine learning projects is a very common topic in data science interviews. In this video, I’ll cover the difference between treating a variable as a dummy variable vs. a non-dummy variable, how you can deal with categorical features when the number of levels is very large, and the pros and cons of various strategies.

Feature hashing
https://en.wikipedia.org/wiki/Feature...

🟢Get all my free data science interview resources
https://www.emmading.com/resources
🟡 Product Case Interview Cheatsheet https://www.emmading.com/product-case...
🟠 Statistics Interview Cheatsheet https://www.emmading.com/statistics-i...
🟣 Behavioral Interview Cheatsheet https://www.emmading.com/behavioral-i...
🔵 Data Science Resume Checklist https://www.emmading.com/data-science...

✅ We work with Experienced Data Scientists to help them land their next dream jobs. Apply now: https://www.emmading.com/coaching

// Comment
Got any questions? Something to add?
Write a comment below to chat.

// Let's connect on LinkedIn:
/ emmading001

====================
Contents of this video:
====================
00:00 Introduction
00:48 Categorical Data
02:22 Ordinal Features & Class Labels
03:38 One-Hot Encoding
05:32 Dummy Encoding
06:30 Problems of One-Hot & Dummy Encoding
07:26 Feature Hashing

Published On Dec 19, 2022

Share/Embed

Video Link