使用MultiIndex列向Pandas DataFrame添加新的列集

时间:2015-11-17 17:21:04

标签: python pandas

以下似乎应该可行,但不会:

import pandas as pd
import numpy as np

df = pd.DataFrame()
for l1 in ('a', 'b'):
    for l2 in ('one', 'two'):
        df[l1, l2] = np.random.random(size=5)
df.columns = pd.MultiIndex.from_tuples(df.columns, names=['L1', 'L2'])

df['difference'] = df['b']-df['a']

我收到以下错误:

ValueError: Wrong number of items passed 2, placement implies 1

我可以通过以下方式解决这个问题:

difference = df['b']-df['a']
df['difference', 'one'] = difference['one']
df['difference', 'two'] = difference['two']

但这似乎效率低下。有更有效的方法吗?

1 个答案:

答案 0 :(得分:0)

您可以一次性完成此操作:

In [11]: df[[("difference", "one"), ("difference", "two")]] = df['b'] - df['a']

In [12]: df
Out[12]:
L1         a                   b           difference
L2       one       two       one       two        one       two
0   0.585409  0.563870  0.535770  0.868020  -0.049639  0.304150
1   0.404546  0.102884  0.254945  0.362751  -0.149601  0.259868
2   0.475362  0.601632  0.476761  0.665126   0.001400  0.063495
3   0.926288  0.615655  0.257977  0.668778  -0.668311  0.053123
4   0.509069  0.706685  0.355842  0.891862  -0.153227  0.185177

更一般地说,您可以使用MultiIndex,例如生成from_product

In [21]: m = pd.MultiIndex.from_product(["difference", ["one", "two"]])

In [22]: df[m] = df['b'] - df['a']

其中RHS可以是结果.columns。