我正在将数据从CSV提取到DF中,并运行下面的代码....得到此错误:
# Import the necessary packages
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import Normalizer
from sklearn.cluster import KMeans
# Define a normalizer
normalizer = Normalizer()
# Fit and transform
norm_movements = normalizer.fit_transform(dfMod)
# Create Kmeans model
kmeans = KMeans(n_clusters = 10,max_iter = 1000)
# Make a pipeline chaining normalizer and kmeans
pipeline = make_pipeline(normalizer,kmeans)
# Fit pipeline to daily stock movements
pipeline.fit(dfMod)
labels = pipeline.predict(dfMod)
print(len(labels), len(dfMod))
df1 = pd.DataFrame({'labels':labels,'dfMod':list(dfMod)}).sort_values(by=['labels'],axis = 0)
# now...with PCA reduction
# Define a normalizer
normalizer = Normalizer()
# Reduce the data
reduced_data = PCA(n_components = 2)
# Create Kmeans model
kmeans = KMeans(n_clusters = 10,max_iter = 1000)
# Make a pipeline chaining normalizer, pca and kmeans
pipeline = make_pipeline(normalizer,reduced_data,kmeans)
# Fit pipeline to daily stock movements
pipeline.fit(dfMod)
# Prediction
labels = pipeline.predict(dfMod)
# Create dataframe to store companies and predicted labels
df2 = pd.DataFrame({'labels':labels,'dfMod':list(dfMod.keys())}).sort_values(by=['labels'],axis = 0)
此行引发错误。
df1 = pd.DataFrame({'labels':labels,'dfMod':list(dfMod)}).sort_values(by=['labels'],axis = 0)
奇怪的是,它分别显示50k和50k。
print(len(labels), len(dfMod))
50000 50000
我在这里想念什么吗?我该如何工作?谢谢!