我有一些代码,它产生以下空数据框:
>>> first = ['foo', 'bar']
>>> second = ['baz', 'can']
>>> third = ['ok', 'ko']
>>> colours = ['blue', 'yellow', 'green']
>>> idx = pd.IndexSlice
>>> ix = pd.MultiIndex.from_arrays(np.array([i for i in itertools.product(first, second, third)]).transpose().tolist(),
names=('first', 'second', 'third'))
>>> df1 = pd.DataFrame(index=ix, columns=colours).sort_index()
>>> print(df1)
blue yellow green
first second third
bar baz ko NaN NaN NaN
ok NaN NaN NaN
can ko NaN NaN NaN
ok NaN NaN NaN
foo baz ko NaN NaN NaN
ok NaN NaN NaN
can ko NaN NaN NaN
ok NaN NaN NaN
我打算这样做,是从另一个给定的DataFrame填充这个基于MultiIndex的空DataFrame,它是基于列的,如下所示(为了清楚起见,列被截断):
baz_ok_blue baz_ko_blue can_ok_blue can_ko_blue baz_ok_yellow
foo -1.385111 -1.014812 -1.419643 1.540341 0.663933
bar 0.445372 -0.226087 0.450982 -1.114169 0.896522
到目前为止,我一直在尝试这种方式:
idx = pd.IndexSlice
for s in second:
for t in third:
for c in colours:
column_name = '{s}_{t}_{c}'.format(s=s, c=c, t=t)
values = df2[column_name]
df1.loc[idx[:, s, t], c] = values
在每次迭代中,values
系列都是正确确定的,但是Pandas并没有将values
的索引与df1的MultiIndex的第一级匹配。因此,所有df1值都保持NaN
,因为Pandas正在尝试将MultiIndex与单个索引匹配。有没有办法解决这个问题?
基本上,为了提供更高层次的观点,我只是尝试将df2(基于字符串列)重新排列为df1(基于MultiIndex)的形式。
答案 0 :(得分:2)
您可以先str.split
创建MultiIndex
,然后按stack
和reindex
重新塑造:
df.columns = df.columns.str.split('_', expand=True)
print (df)
baz can baz
ok ko ok ko ok
blue blue blue blue yellow
foo -1.385111 -1.014812 -1.419643 1.540341 0.663933
bar 0.445372 -0.226087 0.450982 -1.114169 0.896522
df = df.stack([0,1]).reindex(index=df1.index, columns=df1.columns)
print (df)
blue yellow green
first second third
bar baz ko -0.226087 NaN NaN
ok 0.445372 0.896522 NaN
can ko -1.114169 NaN NaN
ok 0.450982 NaN NaN
foo baz ko -1.014812 NaN NaN
ok -1.385111 0.663933 NaN
can ko 1.540341 NaN NaN
ok -1.419643 NaN NaN