通过如下所示的初始多索引DataFrame:
import numpy as np
import pandas as pd
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
print(df)
0 1 2 3
bar one -1.111899 -0.673956 -0.045719 -0.654951
two 0.761249 1.009988 1.718598 1.461674
baz one -1.128029 0.360159 -0.004877 -0.725785
two -0.007996 1.183093 1.651100 -1.408199
foo one 0.935349 0.816100 1.043749 -0.575600
two 0.986057 0.790675 -0.302731 1.434262
qux one 0.564661 -2.821966 0.650187 -0.176112
two -1.353135 0.192120 -0.314343 -1.242303
我只需要按以下方式提取的第一列:
series = df[0]
print(series)
bar one -1.111899
two 0.761249
baz one -1.128029
two -0.007996
foo one 0.935349
two 0.986057
qux one 0.564661
two -1.3531354
type(series)
<class 'pandas.core.series.Series'>
如何将本系列应用于以下DataFrame:
bar baz foo qux
one -1.111899 -1.128029 0.935349 0.564661
two 0.761249 -0.007996 0.986057 -1.353135
?
请注意,我不坚持中间的第二步。获取结果DataFrame仅重要。
答案 0 :(得分:1)
您需要在第一级添加unstack
:
np.random.seed(123)
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
print (df)
0 1 2 3
bar one -1.085631 0.997345 0.282978 -1.506295
two -0.578600 1.651437 -2.426679 -0.428913
baz one 1.265936 -0.866740 -0.678886 -0.094709
two 1.491390 -0.638902 -0.443982 -0.434351
foo one 2.205930 2.186786 1.004054 0.386186
two 0.737369 1.490732 -0.935834 1.175829
qux one -1.253881 -0.637752 0.907105 -1.428681
two -0.140069 -0.861755 -0.255619 -2.798589
df1 = df[0].unstack(level=0)
print (df1)
bar baz foo qux
one -1.085631 1.265936 2.205930 -1.253881
two -0.578600 1.491390 0.737369 -0.140069
另一种解决方案是首先为列中的unstack
MultiIndex
,然后按DataFrame.xs
进行选择:
df1 = df.unstack(level=0)
print (df1)
0 1 \
bar baz foo qux bar baz foo
one -1.085631 1.265936 2.205930 -1.253881 0.997345 -0.866740 2.186786
two -0.578600 1.491390 0.737369 -0.140069 1.651437 -0.638902 1.490732
2 3 \
qux bar baz foo qux bar baz
one -0.637752 0.282978 -0.678886 1.004054 0.907105 -1.506295 -0.094709
two -0.861755 -2.426679 -0.443982 -0.935834 -0.255619 -0.428913 -0.434351
foo qux
one 0.386186 -1.428681
two 1.175829 -2.798589
#more general solution
df2 = df1.xs(0, level=0, axis=1)
#if need seelct first level only
#df2 = df1[0]
print (df2)
bar baz foo qux
one -1.085631 1.265936 2.205930 -1.253881
two -0.578600 1.491390 0.737369 -0.140069