Question

假设我有一个6列的DataFrame：

                  close     high     low    open     volume     change
ts                                                             
2017-08-24 13:00:00  921.28  930.840  915.50  928.66  1270306.0     -7.38
2017-08-25 13:00:00  915.89  925.555  915.50  923.49  1053376.0     -7.6
2017-08-28 13:00:00  913.81  919.245  911.87  916.00  1086484.0     -2.19
2017-08-29 13:00:00  921.29  923.330  905.00  905.10  1185564.0     16.19
2017-08-30 13:00:00  929.57  930.819  919.65  920.05  1301225.0     9.52
2017-08-31 13:00:00  939.33  941.980  931.76  931.76  1560033.0     7.51

如果更改＆gt;如何添加每行显示1的列？ 0.0其他0？

Answer 1

选项1

使用布尔过滤：

df['newCol'] = (df.change > 0).astype(int)
df['newCol'] 

ts
2017-08-24 13:00:00    0
2017-08-25 13:00:00    0
2017-08-28 13:00:00    0
2017-08-29 13:00:00    1
2017-08-30 13:00:00    1
2017-08-31 13:00:00    1
Name: newCol, dtype: int64

选项2

使用np.where。

df['newCol'] = np.where(df.change > 0.0, 1, 0)
df['newCol']

ts
2017-08-24 13:00:00    0
2017-08-25 13:00:00    0
2017-08-28 13:00:00    0
2017-08-29 13:00:00    1
2017-08-30 13:00:00    1
2017-08-31 13:00:00    1
Name: newCol, dtype: int64

选项3

使用df.gt：

df['newCol'] = df.change.gt(0).astype(int)  
df['newCol']  

ts
2017-08-24 13:00:00    0
2017-08-25 13:00:00    0
2017-08-28 13:00:00    0
2017-08-29 13:00:00    1
2017-08-30 13:00:00    1
2017-08-31 13:00:00    1
Name: newCol, dtype: int64

<强>性能

小

%timeit (df.change > 0).astype(int)
1000 loops, best of 3: 276 µs per loop

%timeit np.where(df.change > 0.0, 1, 0)
10000 loops, best of 3: 209 µs per loop

%timeit df.change.gt(0).astype(int) 
1000 loops, best of 3: 351 µs per loop

大

df_test = pd.concat([df] * 10000, 0) # Setup

%timeit (df_test.change > 0).astype(int)
1000 loops, best of 3: 377 µs per loop

%timeit np.where(df_test.change > 0.0, 1, 0)
1000 loops, best of 3: 328 µs per loop

%timeit  df_test.change.gt(0).astype(int) 
1000 loops, best of 3: 425 µs per loop

而且......

%timeit df_test.change.apply(lambda x: 1 if x > 0 else 0)
10 loops, best of 3: 24.5 ms per loop

Answer 2

df['new_column']=df.apply(lambda row: value_return(row['change']),axis=1)

def value_return(change_variable):

     if(change_variable>0):
          m=1
     else:
          m=0
     return m

根据数据框

2 个答案:

小

大