我有一个看起来像这样的数据集,我想将其转换为两个单独的列。
df1
x y z
house
0 4.907 1.416 0.663
0 2.114 1.368 0.681
0 1.261 1.374 0.724
1 1.382 1.480 0.767
1 2.764 1.390 0.661
1 1.410 0.941 0.665
2 1.362 1.498 0.775
2 1.303 0.786 0.682
2 2.687 1.445 0.675
3 1.341 0.932 0.685
3 1.436 1.450 0.748
3 2.466 1.272 0.686
4 1.299 1.072 0.692
4 1.457 1.504 0.748
4 2.296 1.246 0.663
5 1.390 0.918 0.700
5 1.405 1.587 0.817
5 2.482 1.394 0.656
6 1.445 1.116 0.746
6 2.184 1.474 0.710
6 1.319 1.524 0.722
我想转换为此
House cluster x y z
summer 0 4.907 1.416 0.663
0 2.114 1.368 0.681
0 1.261 1.374 0.724
Autumn 1 1.382 1.480 0.767
1 2.764 1.390 0.661
1 1.410 0.941 0.665
Winter 2 1.362 1.498 0.775
2 1.303 0.786 0.682
2 2.687 1.445 0.675
names = ['x', 'y', 'z']
index = pd.MultiIndex.from_product([range(s)for s in A.shape], names=names)
df_ = pd.DataFrame({'A': A.flatten()}, index=index)['A']
df_ = df_.unstack(level='x').swaplevel().sort_index()
df_.columns = ['A', 'B', 'C']
df_.index.names = ['DATE', 'i']
我尝试引用此代码,但显示错误。我想知道在这种情况下应该寻找哪个关键字??
Length mismatch: Expected axis has 15 elements, new values have 3 elements
答案 0 :(得分:2)
我相信您需要按列表进行MultiIndex.from_arrays
的索引,并按列表长度对列表进行模除:
names = ['Summer', 'Autumn', 'Winter', 'Spring']
arr = np.asarray(names)
A.index = pd.MultiIndex.from_arrays([arr[A.index % len(names)], A.index], names=['a','b'])
print (A)
x y z
a b
Summer 0 4.907 1.416 0.663
0 2.114 1.368 0.681
0 1.261 1.374 0.724
Autumn 1 1.382 1.480 0.767
1 2.764 1.390 0.661
1 1.410 0.941 0.665
Winter 2 1.362 1.498 0.775
2 1.303 0.786 0.682
2 2.687 1.445 0.675
Spring 3 1.341 0.932 0.685
3 1.436 1.450 0.748
3 2.466 1.272 0.686
Summer 4 1.299 1.072 0.692
4 1.457 1.504 0.748
4 2.296 1.246 0.663
Autumn 5 1.390 0.918 0.700
5 1.405 1.587 0.817
5 2.482 1.394 0.656
Winter 6 1.445 1.116 0.746
6 2.184 1.474 0.710
6 1.319 1.524 0.722