Kmeans clustering in python - Giving original labels to predicted clusters

Frenzy asked Apr 27, 2022

1,801 views

I have a dataset with 7 labels in the target variable.

X = data.drop('target', axis=1)
Y = data['target']
Y.unique()

array(['Normal_Weight', 'Overweight_Level_I', 'Overweight_Level_II',
'Obesity_Type_I', 'Insufficient_Weight', 'Obesity_Type_II',
'Obesity_Type_III'], dtype=object)

km = KMeans(n_clusters=7, init="k-means++", random_state=300)
km.fit_predict(X)
np.unique(km.labels_)

array([0, 1, 2, 3, 4, 5, 6])

After performing KMean clustering algorithm with number of clusters as 7, the resulted clusters are labeled as 0,1,2,3,4,5,6. But how to know which real label matches with the predicted label.

In other words, I want to know how to give original label names to new predicted labels, so that they can be compared like how many values are clustered correctly (Accuracy).

Frenzy

120 points

Please log in or register to answer this question.

Related questions

0 0 votes

0 0 answers

846

846 views

mrfahrenheit15 asked Dec 3, 2020

846 views

Why should I use Dynamic Time Warping over GMM for timer series clustering?

mrfahrenheit15

120 points

mrfahrenheit15 asked Dec 3, 2020

1 1 vote

1 1 answer

2.6k

2.6k views

tofighi asked Sep 25, 2018

2,599 views

What is the best roadmap to choose the right estimator in scikit-learn?

I am looking for a roadmap for choosing the right estimator in scikit-learn

tofighi

116k points

tofighi asked Sep 25, 2018

2 2 votes

1 1 answer

1.2k

1.2k views

cbarbisan asked Jan 31, 2019

1,189 views

Python Machine Learning: Scikit-Learn Tutorial

Regarding the datacamp tutorial "Python Machine Learning: Scikit-Learn Tutorial", the author is considering the use cases that are relevant to the digits data set, so she...

cbarbisan

180 points

cbarbisan asked Jan 31, 2019

0 0 votes

1 1 answer

1.6k

1.6k views

tofighi asked Feb 18, 2020

1,622 views

Can I use a single Pipeline for multiple estimators in scikit-learn?

Is there any proper way to combine multiple classifiers and their parameter grids in one Pipeline?

tofighi

116k points

tofighi asked Feb 18, 2020

1 1 vote

2 answers 2 answers

12.9k

12.9k views

kaADSS asked Jan 21, 2020

12,897 views

score() vs accuracy_score() in sklearn

Hi,Since I still have confuse to use the score() and accuracy_score(), so I want to confirm my test assumption.Q1: score(), we use the split data to test the accuracy by...

kaADSS

230 points

kaADSS asked Jan 21, 2020

Kmeans clustering in python - Giving original labels to predicted clusters

Please log in or register to add a comment.

Please log in or register to answer this question.

0 Answers

Related questions

0 reply

Please log in or register to add a comment.

Please log in or register to answer this question.

0 Answers

Related questions

0