我有一个时间序列pandas.DataFrame
,' ES_Summary_Index1',如下所示:
Ticker_x Date Close_x 15M_Long 1H_Long Net_Long
0 ES H7 2016-10-18 13:44:59 2128.00 N NaN
1 ES H7 2016-10-18 13:59:59 2128.75 N NaN
2 ES H7 2016-10-18 14:14:59 2125.75 N NaN
3 ES H7 2016-10-18 14:29:59 2126.50 N N
4 ES H7 2016-10-18 14:44:59 2126.50 N NaN
5 ES H7 2016-10-18 16:14:59 2126.00 N NaN
6 ES H7 2016-10-18 16:44:59 2126.25 N NaN
7 ES H7 2016-10-18 17:59:59 2126.50 N NaN
8 ES H7 2016-10-18 18:14:59 2127.00 N NaN
9 ES H7 2016-10-18 19:14:59 2127.75 N NaN
10 ES H7 2016-10-18 19:44:59 2127.75 N NaN
11 ES H7 2016-10-18 19:59:59 2127.75 N NaN
12 ES H7 2016-10-18 20:44:59 2129.00 N NaN
13 ES H7 2016-10-18 21:29:59 2128.75 N N
14 ES H7 2016-10-18 21:44:59 2129.00 N NaN
关注15M_Long
和1H_Long
列,如果两者都说'Y'我希望Net_Long
列也说Long
。如果只有一个或两个都不说'Y'那么我希望Net_Long
列保持空白或说'#34; N" (取)。
首先,我将Net_Long列设置为空白:
ES_Summary_Index1['Net_Long'] = ''
接下来,我写了一个for循环语句来填充Net_Long
列:
for index, row in ES_Summary_Index1.iterrows():
if ES_Summary_Index1.loc[index, '15M_Long'] is 'Y' & ES_Summary_Index1.loc[index, '1H_Long'] is 'Y':
ES_Summary_Index1.loc['Net_Long'] = 'Long'
else:
ES_Summary_Index1.loc['Net_Long'] = 'N'
不幸的是,我收到以下错误:
TypeError: unsupported operand type(s) for &: 'str' and 'float'
...引用上面的if语句(如果ES_Summary_Index1 ...)。我尝试过从&
更改为and
,但这并不像我想的那样填充Net_Long
列。我也试过==
而不是,而且不起作用。有人可以帮忙吗?
答案 0 :(得分:3)
你需要使用布尔掩码快速矢量化numpy.where
:
mask = (df['15M_Long'] == 'Y') & (df['1H_Long'] == 'Y')
df['Net_Long'] = np.where(mask, 'Long', 'N')
print (df)
Ticker_x Date Close_x 15M_Long 1H_Long Net_Long
0 ES_H7 2016-10-18T13:44:59 2128.00 N NaN N
1 ES_H7 2016-10-18T13:59:59 2128.75 N NaN N
2 ES_H7 2016-10-18T19:59:59 2127.75 Y NaN N
3 ES_H7 2016-10-18T20:44:59 2129.00 N Y N
4 ES_H7 2016-10-18T21:29:59 2128.75 Y Y Long
5 ES_H7 2016-10-18T21:44:59 2129.00 N NaN N
<强>计时强>:
#length of df is 600 rows
In [183]: %timeit (iterate(df))
10 loops, best of 3: 67.1 ms per loop
In [184]: %timeit (vectorize(df1))
1000 loops, best of 3: 1.49 ms per loop
#length of df is 6000 rows
In [177]: %timeit (iterate(df))
1 loop, best of 3: 681 ms per loop
In [178]: %timeit (vectorize(df1))
100 loops, best of 3: 3.23 ms per loop
#length of df is 60000 rows
In [180]: %timeit (iterate(df))
1 loop, best of 3: 6.87 s per loop
In [181]: %timeit (vectorize(df1))
10 loops, best of 3: 20.8 ms per loop
时间安排的代码:
data = [x.strip().split() for x in """
Ticker_x Date Close_x 15M_Long 1H_Long
ES_H7 2016-10-18T13:44:59 2128.00 N NaN
ES_H7 2016-10-18T13:59:59 2128.75 N NaN
ES_H7 2016-10-18T19:59:59 2127.75 Y NaN
ES_H7 2016-10-18T20:44:59 2129.00 N Y
ES_H7 2016-10-18T21:29:59 2128.75 Y Y
ES_H7 2016-10-18T21:44:59 2129.00 N NaN
""".split('\n')[1:-1]]
df = pd.DataFrame(data=data[1:], columns=data[0])
#for 600 rows * 100, 6000 rows *1000, 60k * 10000
df = pd.concat([df]*1000).reset_index(drop=True)
print (df)
df1 = df.copy()
def vectorize(df):
mask = (df['15M_Long'] == 'Y') & (df['1H_Long'] == 'Y')
df['Net_Long'] = np.where(mask, 'Long', 'N')
return (df)
def iterate(df):
df['Net_Long'] = ''
for _, row in df.iterrows():
if row['15M_Long'] is 'Y' and row['1H_Long'] is 'Y':
row['Net_Long'] = 'Long'
else:
row['Net_Long'] = 'N'
return df
print (iterate(df))
print (vectorize(df1))
答案 1 :(得分:2)
替换以下行:
if ES_Summary_Index1.loc[index, '15M_Long'] is 'Y' & ES_Summary_Index1.loc[index, '1H_Long'] is 'Y':
与
if ES_Summary_Index1.loc[index, '15M_Long']=='Y' and ES_Summary_Index1.loc[index, '1H_Long']=='Y':
答案 2 :(得分:1)
除了获得正确的逻辑测试外,您还应该直接访问该行。您的当前代码每次都通过循环重置整个列:
<强>代码:强>
df['Net_Long'] = ''
for _, row in df.iterrows():
if row['15M_Long'] is 'Y' and row['1H_Long'] is 'Y':
row['Net_Long'] = 'Long'
else:
row['Net_Long'] = 'N'
测试数据:
import pandas as pd
data = [x.strip().split() for x in """
Ticker_x Date Close_x 15M_Long 1H_Long
ES_H7 2016-10-18T13:44:59 2128.00 N NaN
ES_H7 2016-10-18T13:59:59 2128.75 N NaN
ES_H7 2016-10-18T19:59:59 2127.75 Y NaN
ES_H7 2016-10-18T20:44:59 2129.00 N Y
ES_H7 2016-10-18T21:29:59 2128.75 Y Y
ES_H7 2016-10-18T21:44:59 2129.00 N NaN
""".split('\n')[1:-1]]
df = pd.DataFrame(data=data[1:], columns=data[0])
<强>产地:强>
Ticker_x Date Close_x 15M_Long 1H_Long Net_Long
0 ES_H7 2016-10-18T13:44:59 2128.00 N NaN N
1 ES_H7 2016-10-18T13:59:59 2128.75 N NaN N
2 ES_H7 2016-10-18T19:59:59 2127.75 Y NaN N
3 ES_H7 2016-10-18T20:44:59 2129.00 N Y N
4 ES_H7 2016-10-18T21:29:59 2128.75 Y Y Long
5 ES_H7 2016-10-18T21:44:59 2129.00 N NaN N