我正在尝试从一个系列中获取切片视图(逻辑上由条件索引),然后处理它然后将结果分配回逻辑索引切片。 分配中的LHS和RHS是具有匹配索引的系列,但由于某种未知原因,分配最终成为标量(见底部)。如何获得所需的分配? (我检查了SO和pandas 0.11.0 doc是否有相关内容。)
import numpy as np
import pandas as pd
# A dataframe with sample data and some boolean conditional
df = pd.DataFrame(data={'x': range(1,20)})
df['cond'] = df.x.apply(lambda xx: ((xx%3)==1) )
# Create a new col and selectively assign to it... elsewhere being NaN...
df['newcol'] = np.nan
# This attempted assign to a view of the df doesn't work (in reality the RHS expression would actually be a return value from somefunc)
df.ix[df.cond, df.columns.get_loc('newcol')] = 2* df.ix[df.cond, df.columns.get_loc('x')]
# yet a scalar assign does...
df.ix[df.cond, df.columns.get_loc('newcol')] = 99.
# Likewise bad trying to use -df.cond as the logical index:
df.ix[-df.cond, df.columns.get_loc('newcol')] = 2* df.ix[-df.cond, df.columns.get_loc('x')]
目前我只是得到一个愚蠢的标量赋值:
>>> df.ix[-df.cond, df.columns.get_loc('newcol')] = 2* df.ix[-df.cond, df.columns.get_loc('x')]
>>> df
x cond newcol
0 1 True NaN
1 2 False 4
2 3 False 4
3 4 True NaN
4 5 False 4
5 6 False 4
6 7 True NaN
7 8 False 4
8 9 False 4
9 10 True NaN
10 11 False 4
11 12 False 4
12 13 True NaN
13 14 False 4
14 15 False 4
15 16 True NaN
16 17 False 4
17 18 False 4
18 19 True NaN
答案 0 :(得分:1)
In [21]: df = pd.DataFrame(data={'x': range(1,20)})
In [22]: df['cond'] = df.x.apply(lambda xx: ((xx%3)==1) )
In [23]: df
Out[23]:
x cond
0 1 True
1 2 False
2 3 False
3 4 True
4 5 False
5 6 False
6 7 True
7 8 False
8 9 False
9 10 True
10 11 False
11 12 False
12 13 True
13 14 False
14 15 False
15 16 True
16 17 False
17 18 False
18 19 True
In [24]: df['newcol'] = 2*df.loc[df.cond, 'x']
In [25]: df
Out[25]:
x cond newcol
0 1 True 2
1 2 False NaN
2 3 False NaN
3 4 True 8
4 5 False NaN
5 6 False NaN
6 7 True 14
7 8 False NaN
8 9 False NaN
9 10 True 20
10 11 False NaN
11 12 False NaN
12 13 True 26
13 14 False NaN
14 15 False NaN
15 16 True 32
16 17 False NaN
17 18 False NaN
18 19 True 38
In [10]: def myfunc(df_):
....: return 2 * df_
....:
In [26]: df['newcol'] = myfunc(df.ix[df.cond, df.columns.get_loc('newcol')])
In [27]: df
Out[27]:
x cond newcol
0 1 True 4
1 2 False NaN
2 3 False NaN
3 4 True 16
4 5 False NaN
5 6 False NaN
6 7 True 28
7 8 False NaN
8 9 False NaN
9 10 True 40
10 11 False NaN
11 12 False NaN
12 13 True 52
13 14 False NaN
14 15 False NaN
15 16 True 64
16 17 False NaN
17 18 False NaN
18 19 True 76
答案 1 :(得分:0)
我发现了这个解决方法:
tmp = pd.Series(np.repeat(np.nan, len(df)))
tmp[-cond] = 2* df.loc[df.cond, 'x']
df['newcol'] = tmp
奇怪的是,以下有时会起作用(将切片指定给整个系列)
(但是使用AssertionError: Length of values does not match length of index
)
(根据pandas doc,RHS系列索引应该与LHS保持一致,至少如果LHS是一个数据帧 - 但如果它是一个系列则不行?这是一个错误吗?)
>>> df['newcol'] = 2* df.loc[df.cond, 'x']
>>> df
x cond newcol
0 1 True 2
1 2 False NaN
2 3 False NaN
3 4 True 8
4 5 False NaN
5 6 False NaN
6 7 True 14
7 8 False NaN
8 9 False NaN
9 10 True 20
10 11 False NaN
11 12 False NaN
12 13 True 26
13 14 False NaN
14 15 False NaN
15 16 True 32
16 17 False NaN
17 18 False NaN
18 19 True 38
Jeff,奇怪的是我们可以分配给df ['newcol'](应该是副本而不是视图,对吧?) 当我们这样做时:
df['newcol'] = 2* df.loc[df.cond, 'x']
但不是当我们对来自fn的RHS做同样的事情时:
def myfunc(df_):
"""Some func transforming and returning said Series slice"""
return 2* df_
df['newcol'] = myfunc( df.ix[df.cond, df.columns.get_loc('newcol')] )