当一列具有一些NaN条目时,取消堆叠Pandas数据帧

时间:2013-09-17 15:33:58

标签: pandas reshape

我有一个Pandas DataFrame,我使用unstack()方法将一些行条目转换为列(如this question中所述)。为此,我set_index使用了不透明的列,然后调用call unstack()来获取我真正想要的数据帧。

但是,如果索引的某些元素是NaN,我会遇到一些恼人的错误。有时我被告知索引有重复的条目(这是不真实的),有时我被告知NaN无法转换为整数。这是一个例子

import pandas
from numpy import nan

df = pandas.DataFrame(
    {'agent': {
                      17263: 'Hg',
                      17264: 'U',
                      17265: 'Pb',
                      17266: 'Sn',
                      17267: 'Ag',
                      17268: 'Hg'},
    'change': {
                      17263: nan,
                      17264: 0.0,
                      17265: 7.070e-06,
                      17266: 2.3614e-05,
                      17267: 0.0,
                      17268: -0.00015},
    'dosage': {
                      17263: nan,
                      17264: nan,
                      17265: nan,
                      17266: 0.0133,
                      17267: 0.0133,
                      17268: 0.0133},
    's_id': {
                      17263: 680585148,
                      17264: 680585148,
                      17265: 680585148,
                      17266: 680607017,
                      17267: 680607017,
                      17268: 680607017}}
            )
try:
    dupe = df.copy().set_index(['s_id','dosage','agent'])
    badDupe = dupe.unstack()
except Exception as e:
    print( 'Error with all data was: %s'%(e,) )
try:
    getnan = df.ix[17264:].copy().set_index(['s_id','dosage','agent'])
    badNan = getnan.unstack()
except Exception as e:
    print( 'Error dropping first entry was: %s'%(e,) )
df.dosage[:3]=42
willWork = df.copy().set_index(['s_id','dosage','agent'])
u = willWork.unstack()
print(u)

其输出为

Error with all data was: Index contains duplicate entries, cannot reshape
Error dropping first entry was: cannot convert float NaN to integer

                   change                                 
agent                  Ag       Hg        Pb        Sn   U
s_id      dosage                                          
680585148 42.0000     NaN      NaN  0.000007       NaN   0
680607017 0.0133        0 -0.00015       NaN  0.000024 NaN

如您所见,如果我将剂量设置为除NaN以外的其他东西(此处为42),则重塑操作正常。

获取我寻求的重塑数据帧的最佳方法是什么?我应该将哨兵值放入剂量中,然后再替换吗?这似乎......不优雅。

1 个答案:

答案 0 :(得分:0)

在没有df.dosage[:3]=42的pandas 0.16版本中执行了您的代码,它的工作原理为:

In [1405]: u
Out[1405]: 
                 change                                 
agent                Ag       Hg        Pb        Sn   U
s_id      dosage                                        
680585148 NaN       NaN      NaN  0.000007       NaN   0
680607017 0.0133      0 -0.00015       NaN  0.000024 NaN