Pandas.apply()的意外结果是由于将整数作为列索引

时间:2014-07-12 01:46:10

标签: python pandas

让我们看看这个最小的例子:

In [208]:
L={'A':[[1,2]],
   'B':[[3,4], [5,6]]}
df=pd.DataFrame.from_dict(dict(L), orient="index").stack().reset_index(level=0)
df['val']=None
print 'before apply. \n\n', df
f=lambda x: [x[0], x[1][0], x[1][1]]
print '\nafter apply. \n\n', df.apply(f, axis=1)

before apply. 

  level_0       0   val
0       A  [1, 2]  None
0       B  [3, 4]  None
1       B  [5, 6]  None

after apply. 

  level_0  0  val
0  [1, 2]  1    2
0  [3, 4]  3    4
1  [5, 6]  5    6

奇怪! lambda函数应该为每一行返回list:对于第一行,结果应为['A', 1, 2],因此,apply()的预期行为应为:

  level_0  0  val
0       A  1  2
0       B  3  4
1       B  5  6

我对apply()误解了吗?

2 个答案:

答案 0 :(得分:2)

因为您的列名为0(整数),所以x[0]表示"从名为0的列中获取" ,而不是列号0.但是没有名为1的列,因此x[1]表示"从列号1和#34;得到。

尝试使用x['level_0']

f=lambda x: [x['level_0'], x[1][0], x[1][1]]

或将columna 0重命名为字符串"0"

答案 1 :(得分:1)

在线评论:

>>> ts = df.iloc[0,]  # take the first row as an example
>>> ts
level_0         A
0          [1, 2]
val          None
Name: 0, dtype: object
>>> ts[0]  # `0` is in the index, so it resolves to item with `index` 0
[1, 2]
>>> ts[1]  # one is not in the index, so it resolves to ts.iloc[1]
[1, 2]
>>> ts[1][0] # (ts.iloc[1])[0]
1
>>> ts[1][1] # (ts.iloc[1])[1]
2

故事的道德:不要使用整数值作为列名