548 views
0 0 votes
So say I have a column with categorical data like different styles of temperature: 'Lukewarm', 'Hot', 'Scalding', 'Cold', 'Frostbite',... etc.


I know that we can use pd.get_dummies to convert the column to numerical data within the dataframe, but I also know that there are other 'converters' (not sure if that's the correct terminology) that we can use, i.e. OneHotEncoder from Sk-learn (like I could use the pipeline module to make a nice pipeline and feed my dataframe through the pipeline to also get my categorical data encoded to numerical).


How do I know which to use? Does it matter? If it does matter, when does it matter the most (i.e. what types of problems? When there are lots of categorical variables, or few?) If anyone can give me any pointers on this type of stuff I'd greatly appreciate it.
0% Accept Rate Accepted 0 answers out of 1 questions

Please log in or register to answer this question.

Related questions

1 1 vote
1 1 answer
1.4k
1.4k views
Anas asked Dec 18, 2021
1,359 views
It's a car prices dataset, and so I'm assuming that the more recent the more value a car should have. The values in the 'year' column simply consist of years from 1995 to...
1 1 vote
1 answers 1 answer
1.5k
1.5k views
interview asked Dec 24, 2019
1,541 views
Consider the Pandas DataDrame df below. Filter it appropriately so that it outputs the shown results.gh owner language repo stars 0 pandas-dev python pandas 17800 1 tidyv...
2 2 votes
1 1 answer
591
591 views
1 1 vote
1 1 answer
1.6k
1.6k views
Hagar asked Jun 24, 2023
1,599 views
Hello,I have a dataset with a categorical column that contains three categories. One of the categories represents 98% of the data, while the remaining 2% are distributed ...
1 1 vote
1 1 answer
724
724 views
metelon asked Dec 15, 2020
724 views
When I standardized my data when I created my model. Do I need to save the standardization transformation when I want to predict with my model new data ?