我有一个Pandas DataFrame,我使用unstack()
方法将一些行条目转换为列(如this question中所述)。为此,我set_index
使用了不透明的列,然后调用call unstack()
来获取我真正想要的数据帧。
但是,如果索引的某些元素是NaN,我会遇到一些恼人的错误。有时我被告知索引有重复的条目(这是不真实的),有时我被告知NaN无法转换为整数。这是一个例子
import pandas
from numpy import nan
df = pandas.DataFrame(
{'agent': {
17263: 'Hg',
17264: 'U',
17265: 'Pb',
17266: 'Sn',
17267: 'Ag',
17268: 'Hg'},
'change': {
17263: nan,
17264: 0.0,
17265: 7.070e-06,
17266: 2.3614e-05,
17267: 0.0,
17268: -0.00015},
'dosage': {
17263: nan,
17264: nan,
17265: nan,
17266: 0.0133,
17267: 0.0133,
17268: 0.0133},
's_id': {
17263: 680585148,
17264: 680585148,
17265: 680585148,
17266: 680607017,
17267: 680607017,
17268: 680607017}}
)
try:
dupe = df.copy().set_index(['s_id','dosage','agent'])
badDupe = dupe.unstack()
except Exception as e:
print( 'Error with all data was: %s'%(e,) )
try:
getnan = df.ix[17264:].copy().set_index(['s_id','dosage','agent'])
badNan = getnan.unstack()
except Exception as e:
print( 'Error dropping first entry was: %s'%(e,) )
df.dosage[:3]=42
willWork = df.copy().set_index(['s_id','dosage','agent'])
u = willWork.unstack()
print(u)
其输出为
Error with all data was: Index contains duplicate entries, cannot reshape
Error dropping first entry was: cannot convert float NaN to integer
change
agent Ag Hg Pb Sn U
s_id dosage
680585148 42.0000 NaN NaN 0.000007 NaN 0
680607017 0.0133 0 -0.00015 NaN 0.000024 NaN
如您所见,如果我将剂量设置为除NaN以外的其他东西(此处为42),则重塑操作正常。
获取我寻求的重塑数据帧的最佳方法是什么?我应该将哨兵值放入剂量中,然后再替换吗?这似乎......不优雅。
答案 0 :(得分:0)
在没有df.dosage[:3]=42
的pandas 0.16版本中执行了您的代码,它的工作原理为:
In [1405]: u
Out[1405]:
change
agent Ag Hg Pb Sn U
s_id dosage
680585148 NaN NaN NaN 0.000007 NaN 0
680607017 0.0133 0 -0.00015 NaN 0.000024 NaN