更新:如评论中所述,我的索引不是唯一的。通过pivot.table解决了
我得到了以下代码来在df上执行聚类。此df大约为80 K行(df称为“ Kmeans”)。然后,我有另一个与'Kmeans'(即'SKU_NR')具有共同值的df,其行数略少于80K(此df被命名为'Historie')。我想将df'Kmeans'与df'Historie'合并,但是当我这样做时,它给了我2百万行。我以前做过,然后成功了。代码出了什么问题?
#load in libraries
import pandas as pd
import numpy as np
pd.options.mode.chained_assignment = None
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
#Load and prepare data
Historie = pd.read_excel("file.xlsx")
Kmeans = Historie[['SKU_NR','ORDER_ADV_CONS_UNITS_WK_PICK']]
Kmeans = Kmeans.dropna()
from sklearn.cluster import KMeans
km = KMeans(n_clusters=3)
km.fit(Kmeans)
km.predict(Kmeans)
labels = km.labels_
Kmeans["Classification"] = labels
Kmeans = Kmeans[["SKU_NR","Classification"]]
Historie
=Historie[['SKU_NR','WEEKNR','ORDER_ADV_CONS_UNITS_WK_PICK',
'FORECAST_NEC_STOCK_BASE']]
Historie = Historie.merge(Kmeans, on = "SKU_NR")