How do I know which encoder to use to convert from categorical variables to numerical?

Anas asked Nov 28, 2021

728 views

So say I have a column with categorical data like different styles of temperature: 'Lukewarm', 'Hot', 'Scalding', 'Cold', 'Frostbite',... etc.

I know that we can use pd.get_dummies to convert the column to numerical data within the dataframe, but I also know that there are other 'converters' (not sure if that's the correct terminology) that we can use, i.e. OneHotEncoder from Sk-learn (like I could use the pipeline module to make a nice pipeline and feed my dataframe through the pipeline to also get my categorical data encoded to numerical).

How do I know which to use? Does it matter? If it does matter, when does it matter the most (i.e. what types of problems? When there are lots of categorical variables, or few?) If anyone can give me any pointers on this type of stuff I'd greatly appreciate it.

Anas

150 points

Please log in or register to answer this question.

Related questions

1 1 vote

1 1 answer

1.7k

1.7k views

Anas asked Dec 18, 2021

1,681 views

When dealing with categorical values, should the 'year' column be encoded using OHE or OrdinalEncoder?

It's a car prices dataset, and so I'm assuming that the more recent the more value a car should have. The values in the 'year' column simply consist of years from 1995 to...

Anas

150 points

Anas asked Dec 18, 2021

1 1 vote

1 answers 1 answer

1.9k

1.9k views

interview asked Dec 24, 2019

1,876 views

How to filter a dataframe?

Consider the Pandas DataDrame df below. Filter it appropriately so that it outputs the shown results.gh owner language repo stars 0 pandas-dev python pandas 17800 1 tidyv...

interview

1.4k points

interview asked Dec 24, 2019

2 2 votes

1 1 answer

929

929 views

cbarbisan asked Feb 19, 2019

929 views

How do I know when it is appropriate to use stratified sampling?

cbarbisan

180 points

cbarbisan asked Feb 19, 2019

1 1 vote

1 1 answer

2.0k

2.0k views

Hagar asked Jun 24, 2023

2,044 views

How to analyse imbalanced categorical colum in dataset

Hello,I have a dataset with a categorical column that contains three categories. One of the categories represents 98% of the data, while the remaining 2% are distributed ...

Hagar

130 points

Hagar asked Jun 24, 2023

1 1 vote

1 1 answer

1.0k

1.0k views

metelon asked Dec 15, 2020

1,002 views

Do I need to save the standardization transformation?

When I standardized my data when I created my model. Do I need to save the standardization transformation when I want to predict with my model new data ?

metelon

140 points

metelon asked Dec 15, 2020

How do I know which encoder to use to convert from categorical variables to numerical?

Please log in or register to add a comment.

Please log in or register to answer this question.

0 Answers

Related questions

0 reply

Please log in or register to add a comment.

Please log in or register to answer this question.

0 Answers

Related questions

0