将系列视图分配给系列视图的pandas不起作用?

时间:2013-06-03 00:51:45

标签: python pandas dataframe slice series

我正在尝试从一个系列中获取切片视图(逻辑上由条件索引),然后处理它然后将结果分配回逻辑索引切片。 分配中的LHS和RHS是具有匹配索引的系列,但由于某种未知原因,分配最终成为标量(见底部)。如何获得所需的分配? (我检查了SO和pandas 0.11.0 doc是否有相关内容。)

import numpy as np
import pandas as pd

# A dataframe with sample data and some boolean conditional
df = pd.DataFrame(data={'x': range(1,20)})
df['cond'] = df.x.apply(lambda xx: ((xx%3)==1) )

# Create a new col and selectively assign to it... elsewhere being NaN...
df['newcol'] = np.nan
# This attempted assign to a view of the df doesn't work (in reality the RHS expression would actually be a return value from somefunc)
df.ix[df.cond, df.columns.get_loc('newcol')] = 2* df.ix[df.cond, df.columns.get_loc('x')]
# yet a scalar assign does...
df.ix[df.cond, df.columns.get_loc('newcol')] = 99.
# Likewise bad trying to use -df.cond as the logical index:
df.ix[-df.cond, df.columns.get_loc('newcol')] = 2* df.ix[-df.cond, df.columns.get_loc('x')]

目前我只是得到一个愚蠢的标量赋值:

>>> df.ix[-df.cond, df.columns.get_loc('newcol')] = 2* df.ix[-df.cond, df.columns.get_loc('x')]
>>> df
     x   cond  newcol
0    1   True     NaN
1    2  False       4
2    3  False       4
3    4   True     NaN
4    5  False       4
5    6  False       4
6    7   True     NaN
7    8  False       4
8    9  False       4
9   10   True     NaN
10  11  False       4
11  12  False       4
12  13   True     NaN
13  14  False       4
14  15  False       4
15  16   True     NaN
16  17  False       4
17  18  False       4
18  19   True     NaN

2 个答案:

答案 0 :(得分:1)

In [21]: df = pd.DataFrame(data={'x': range(1,20)})

In [22]: df['cond'] = df.x.apply(lambda xx: ((xx%3)==1) )

In [23]: df
Out[23]: 
     x   cond
0    1   True
1    2  False
2    3  False
3    4   True
4    5  False
5    6  False
6    7   True
7    8  False
8    9  False
9   10   True
10  11  False
11  12  False
12  13   True
13  14  False
14  15  False
15  16   True
16  17  False
17  18  False
18  19   True

In [24]: df['newcol'] = 2*df.loc[df.cond, 'x']

In [25]: df
Out[25]: 
     x   cond  newcol
0    1   True       2
1    2  False     NaN
2    3  False     NaN
3    4   True       8
4    5  False     NaN
5    6  False     NaN
6    7   True      14
7    8  False     NaN
8    9  False     NaN
9   10   True      20
10  11  False     NaN
11  12  False     NaN
12  13   True      26
13  14  False     NaN
14  15  False     NaN
15  16   True      32
16  17  False     NaN
17  18  False     NaN
18  19   True      38


In [10]: def myfunc(df_):
   ....:     return 2 * df_
   ....: 

 In [26]: df['newcol'] = myfunc(df.ix[df.cond, df.columns.get_loc('newcol')])

In [27]: df
Out[27]: 
     x   cond  newcol
0    1   True       4
1    2  False     NaN
2    3  False     NaN
3    4   True      16
4    5  False     NaN
5    6  False     NaN
6    7   True      28
7    8  False     NaN
8    9  False     NaN
9   10   True      40
10  11  False     NaN
11  12  False     NaN
12  13   True      52
13  14  False     NaN
14  15  False     NaN
15  16   True      64
16  17  False     NaN
17  18  False     NaN
18  19   True      76

答案 1 :(得分:0)

我发现了这个解决方法:

tmp = pd.Series(np.repeat(np.nan, len(df)))
tmp[-cond] = 2* df.loc[df.cond, 'x']
df['newcol'] = tmp

奇怪的是,以下有时会起作用(将切片指定给整个系列) (但是使用AssertionError: Length of values does not match length of index

的更复杂的RHS失败了

(根据pandas doc,RHS系列索引应该与LHS保持一致,至少如果LHS是一个数据帧 - 但如果它是一个系列则不行?这是一个错误吗?)

>>> df['newcol'] = 2* df.loc[df.cond, 'x']
>>> df
     x   cond  newcol
0    1   True       2
1    2  False     NaN
2    3  False     NaN
3    4   True       8
4    5  False     NaN
5    6  False     NaN
6    7   True      14
7    8  False     NaN
8    9  False     NaN
9   10   True      20
10  11  False     NaN
11  12  False     NaN
12  13   True      26
13  14  False     NaN
14  15  False     NaN
15  16   True      32
16  17  False     NaN
17  18  False     NaN
18  19   True      38

Jeff,奇怪的是我们可以分配给df ['newcol'](应该是副本而不是视图,对吧?) 当我们这样做时:

df['newcol'] = 2* df.loc[df.cond, 'x']

但不是当我们对来自fn的RHS做同样的事情时:

def myfunc(df_):
    """Some func transforming and returning said Series slice"""
    return 2* df_

df['newcol'] = myfunc( df.ix[df.cond, df.columns.get_loc('newcol')] )