根据分层行索引

时间:2017-11-03 04:50:08

标签: python pandas

我有一个带有分层行索引的pandas数据框

def stack_example():
    i = pd.DatetimeIndex([ '2011-04-04',
                          '2011-04-06',
                          '2011-04-12', '2011-04-13'])
    cols = pd.MultiIndex.from_product([['milk', 'honey'],[u'jan', u'feb'], [u'PRICE','LITERS']])
    df = pd.DataFrame(np.random.randint(12, size=(len(i), 8)), index=i, columns=cols)

    df.columns.names = ['food', 'month', 'measure']
    df.index.names = ['when']

    df = df.stack('food', 'columns')
    df= df.stack('month', 'columns')

    df['constant_col'] = "foo"
    df['liters_related_col'] = df['LITERS']*99


    return df

我可以根据常量或基于涉及其他列的计算向此数据框添加新列。

我想基于涉及索引的计算添加新列。

例如,只需重复两次食物名称:

df.index
MultiIndex(levels=[[2011-04-04 00:00:00, 2011-04-06 00:00:00, 2011-04-12 00:00:00, 2011-04-13 00:00:00], [u'honey', u'milk'], [u'feb', u'jan']],
           labels=[[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3], [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]],
           names=[u'when', u'food', u'month'])
df.index.values[4][1]*2
'honeyhoney'

但我无法弄清楚创建这样的东西的语法:

df['xcol'] = df.index.values[2]*2

    Traceback (most recent call last):
      File "<input>", line 1, in <module>
      File "C:\Users\mds\Anaconda2\envs\bbg27\lib\site-packages\pandas\core\frame.py", line 2519, in __setitem__
        self._set_item(key, value)
      File "C:\Users\mds\Anaconda2\envs\bbg27\lib\site-packages\pandas\core\frame.py", line 2585, in _set_item
        value = self._sanitize_column(key, value)
      File "C:\Users\mds\Anaconda2\envs\bbg27\lib\site-packages\pandas\core\frame.py", line 2760, in _sanitize_column
        value = _sanitize_index(value, self.index, copy=False)
      File "C:\Users\mds\Anaconda2\envs\bbg27\lib\site-packages\pandas\core\series.py", line 3080, in _sanitize_index
        raise ValueError('Length of values does not match length of ' 'index')
    ValueError: Length of values does not match length of index

我也尝试了df['xcol'] = df.index.values[:][2]*2

之类的变体

1 个答案:

答案 0 :(得分:0)

对于df.index.values[4][1] * 2,其值为字符串(honeyhoney),可以将其分配给列:

df['col1'] = df.index.values[4][1] * 2

df.col1

when        food   month
2011-04-04  honey  feb      honeyhoney
                   jan      honeyhoney
            milk   feb      honeyhoney
                   jan      honeyhoney

在你的第二个例子中,那个有错误的例子,你实际上并没有对一个值进行操作:

df.index.values[2]*2

(Timestamp('2011-04-04 00:00:00'),
 'milk',
 'feb',
 Timestamp('2011-04-04 00:00:00'),
 'milk',
 'feb')

df['col2'] = ''.join([str(x) for x in df.index.values[2]*2])

但主要问题是df.index.values[2]*2的输出为您提供了一个多维结构,它不会映射到df的现有结构。

df中的新列可以是单个值(在这种情况下,它会自动复制以适合df中的行数),或者它们可以与{{len(df)具有相同数量的条目1}}。

<强>更新
每条评论

IIUC,您可以使用get_level_values()将操作应用于MultiIndex的整个级别:

df.index.get_level_values(1).values*2

array(['honeyhoney', 'honeyhoney', 'milkmilk', 'milkmilk', 'honeyhoney',
       'honeyhoney', 'milkmilk', 'milkmilk', 'honeyhoney', 'honeyhoney',
       'milkmilk', 'milkmilk', 'honeyhoney', 'honeyhoney', 'milkmilk',
       'milkmilk'], dtype=object)