根据值从DataReader中选择行并将其传输到DataFrame

时间:2019-11-02 17:38:16

标签: python pandas dataframe

我正在做一个项目,读取给定股票的历史价值,然后我想过滤掉价格上涨+ 5%或-5%的日子到另一个数据框中。

但是我正在为转移行而苦苦挣扎。

import pandas_datareader as web
import pandas as pd
import datetime

start = datetime.datetime(2015, 9, 1)
end = datetime.datetime(2019, 11, 2)

df1 = pd.DataFrame()
df = web.DataReader("amd", 'yahoo', start, end)

df['Close'] = df['Close'].astype(float)
df['Open'] = df['Open'].astype(float)

for row in df:
    df['perchange'] = ((df['Close']-df['Open'])/df['Open'])*100
    df['perchange'] = df['perchange'].astype(float)

    if df['perchange'] >= 5.0:
        df1 += df

    if ['perchange'] <= -5.0:
        df1 += df

df.to_csv('amd_volume_price_history.csv')
df1.to_csv('amd_5_to_5.csv')

2 个答案:

答案 0 :(得分:1)

您可以执行以下操作使用以下命令创建新的数据框: 更改百分比的绝对值大于5%的行。如您所见,Series.between已用于执行boolean indexing

not_significant=((df['Close']-df['Open'])/df['Open']).between(-0.05,0.05)
df_filtered=df[~not_significant]
print(df_filtered)

输出

                 High        Low       Open      Close     Volume  Adj Close
Date                                                                        
2015-09-11   2.140000   1.810000   1.880000   2.010000   31010300   2.010000
2015-09-14   2.000000   1.810000   2.000000   1.820000   16458500   1.820000
2015-10-19   2.010000   1.910000   1.910000   2.010000   10670800   2.010000
2015-10-23   2.210000   2.100000   2.100000   2.210000    9564200   2.210000
2015-11-03   2.290000   2.160000   2.160000   2.280000    8705800   2.280000
...               ...        ...        ...        ...        ...        ...
2019-06-06  31.980000  29.840000  29.870001  31.820000  131267800  31.820000
2019-07-31  32.299999  30.299999  32.080002  30.450001  119190000  30.450001
2019-08-08  34.270000  31.480000  31.530001  33.919998  167278800  33.919998
2019-08-12  34.650002  32.080002  34.160000  32.430000  106936000  32.430000
2019-08-23  31.830000  29.400000  31.299999  29.540001   83681100  29.540001

[123 rows x 6 columns]

如果您确实需要perchange列,则可以创建更改代码:

df['Perchange']=(df['Close']-df['Open'])/df['Open']*100
not_significant=(df['Perchange']).between(-5,5)
df_filtered=df[~not_significant]
print(df_filtered)

您还可以使用DataFrame.pct_change

df['Perchange']=df[['Open','Close']].pct_change(axis=1).Close*100

输出

                 High        Low       Open      Close     Volume  Adj Close  \
Date                                                                           
2015-09-11   2.140000   1.810000   1.880000   2.010000   31010300   2.010000   
2015-09-14   2.000000   1.810000   2.000000   1.820000   16458500   1.820000   
2015-10-19   2.010000   1.910000   1.910000   2.010000   10670800   2.010000   
2015-10-23   2.210000   2.100000   2.100000   2.210000    9564200   2.210000   
2015-11-03   2.290000   2.160000   2.160000   2.280000    8705800   2.280000   
...               ...        ...        ...        ...        ...        ...   
2019-06-06  31.980000  29.840000  29.870001  31.820000  131267800  31.820000   
2019-07-31  32.299999  30.299999  32.080002  30.450001  119190000  30.450001   
2019-08-08  34.270000  31.480000  31.530001  33.919998  167278800  33.919998   
2019-08-12  34.650002  32.080002  34.160000  32.430000  106936000  32.430000   
2019-08-23  31.830000  29.400000  31.299999  29.540001   83681100  29.540001   

            Perchange  
Date                   
2015-09-11   6.914893  
2015-09-14  -8.999997  
2015-10-19   5.235603  
2015-10-23   5.238102  
2015-11-03   5.555550  
...               ...  
2019-06-06   6.528285  
2019-07-31  -5.081050  
2019-08-08   7.580074  
2019-08-12  -5.064401  
2019-08-23  -5.622998  

[123 rows x 7 columns]

  

您的代码如下:

#Libraries
import pandas_datareader as web
import pandas as pd
import datetime

#Getting data
start = datetime.datetime(2015, 9, 1)
end = datetime.datetime(2019, 11, 2)
df = web.DataReader("amd", 'yahoo', start, end)

#Convertint to float to calculate and filtering
df['Close'] = df['Close'].astype(float)
df['Open'] = df['Open'].astype(float)

#Creating Perchange column.
df['Perchange']=(df['Close']-df['Open'])/df['Open']*100
#df['Perchange']=df[['Open','Close']].pct_change(axis=1).Close*100

#Filtering
not_significant=(df['Perchange']).between(-5,5)
df_filtered=df[~not_significant]

#Saving data.
df.to_csv('amd_volume_price_history.csv')
df_filtered.to_csv('amd_5_to_5.csv')

编辑

df['Perchange']=(df['Close']-df['Open'])/df['Open']*100
significant=~(df['Perchange']).between(-5,5)
group_by_jump=significant.cumsum()
jump_and_4=group_by_jump.groupby(group_by_jump,sort=False).cumcount().le(4)&group_by_jump.ne(0)
df_filtered=df[jump_and_4]
print(df_filtered.head(50))

            High   Low  Open  Close    Volume  Adj Close  Perchange
Date                                                               
2015-09-11  2.14  1.81  1.88   2.01  31010300       2.01   6.914893
2015-09-14  2.00  1.81  2.00   1.82  16458500       1.82  -8.999997
2015-09-15  1.87  1.81  1.84   1.86   6524400       1.86   1.086955
2015-09-16  1.90  1.85  1.87   1.89   4928300       1.89   1.069518
2015-09-17  1.94  1.87  1.90   1.89   5831600       1.89  -0.526315
2015-09-18  1.92  1.85  1.87   1.87  11814000       1.87   0.000000
2015-10-19  2.01  1.91  1.91   2.01  10670800       2.01   5.235603
2015-10-20  2.03  1.97  2.00   2.02   5584200       2.02   0.999999
2015-10-21  2.12  2.01  2.02   2.10  14944100       2.10   3.960392
2015-10-22  2.16  2.09  2.10   2.14   8208400       2.14   1.904772
2015-10-23  2.21  2.10  2.10   2.21   9564200       2.21   5.238102
2015-10-26  2.21  2.12  2.21   2.15   6313500       2.15  -2.714929
2015-10-27  2.16  2.10  2.12   2.15   5755600       2.15   1.415104
2015-10-28  2.20  2.12  2.14   2.18   6950600       2.18   1.869157
2015-10-29  2.18  2.11  2.15   2.13   4500400       2.13  -0.930232
2015-11-03  2.29  2.16  2.16   2.28   8705800       2.28   5.555550
2015-11-04  2.30  2.18  2.27   2.20   8205300       2.20  -3.083698
2015-11-05  2.24  2.17  2.21   2.20   4302200       2.20  -0.452488
2015-11-06  2.21  2.13  2.19   2.15   8997100       2.15  -1.826482
2015-11-09  2.18  2.10  2.15   2.11   6231200       2.11  -1.860474
2015-11-18  2.15  1.98  1.99   2.12   9384700       2.12   6.532657
2015-11-19  2.16  2.09  2.10   2.14   4704300       2.14   1.904772
2015-11-20  2.25  2.13  2.14   2.22  10727100       2.22   3.738314
2015-11-23  2.24  2.18  2.22   2.22   4863200       2.22   0.000000
2015-11-24  2.40  2.17  2.20   2.34  15859700       2.34   6.363630
2015-11-25  2.40  2.31  2.36   2.38   6914800       2.38   0.847467
2015-11-27  2.38  2.32  2.37   2.33   2606600       2.33  -1.687762
2015-11-30  2.37  2.25  2.34   2.36   9924400       2.36   0.854700
2015-12-01  2.37  2.31  2.36   2.34   5646400       2.34  -0.847457
2015-12-16  2.55  2.37  2.39   2.54  19543600       2.54   6.276144
2015-12-17  2.60  2.52  2.52   2.56  11374100       2.56   1.587300
2015-12-18  2.55  2.42  2.51   2.45  17988100       2.45  -2.390436
2015-12-21  2.53  2.43  2.47   2.53   6876600       2.53   2.429147
2015-12-22  2.78  2.54  2.55   2.77  24893200       2.77   8.627452
2015-12-23  2.94  2.75  2.76   2.83  30365300       2.83   2.536229
2015-12-24  3.00  2.86  2.88   2.92  11890900       2.92   1.388888
2015-12-28  3.02  2.86  2.91   3.00  16050500       3.00   3.092780
2015-12-29  3.06  2.97  3.04   3.00  15300900       3.00  -1.315788
2016-01-06  2.71  2.47  2.66   2.51  23759400       2.51  -5.639101
2016-01-07  2.48  2.26  2.43   2.28  22203500       2.28  -6.172843
2016-01-08  2.42  2.10  2.36   2.14  31822400       2.14  -9.322025
2016-01-11  2.36  2.12  2.16   2.34  19629300       2.34   8.333325
2016-01-12  2.46  2.28  2.40   2.39  17986100       2.39  -0.416666
2016-01-13  2.45  2.21  2.40   2.25  12749700       2.25  -6.250004
2016-01-14  2.35  2.21  2.29   2.21  15666600       2.21  -3.493447
2016-01-15  2.13  1.99  2.10   2.03  21199300       2.03  -3.333330
2016-01-19  2.11  1.90  2.08   1.95  18978900       1.95  -6.249994
2016-01-20  1.95  1.75  1.81   1.80  29243600       1.80  -0.552486
2016-01-21  2.18  1.81  1.82   2.09  26387900       2.09  14.835157
2016-01-22  2.17  1.98  2.11   2.02  16245500       2.02  -4.265399

答案 1 :(得分:0)

尝试通过以下修改集成您的代码:

1)您可能不需要任何循环即可计算新列:

df['perchange'] = ((df['Close']-df['Open'])/df['Open'])*100
df['perchange'] = df['perchange'].astype(float)

2)定义一个空df

df1=pd.DataFrame([])

3)使用 loc 方法过滤旧的df(使用它的符号非常有用)并将其附加到空数据框中,这将传输验证条件的行

df1=df1.append(df.loc[(df['perchange'] <= -5.0) | (df['perchange'] >= -5.0)])
print(df1)

希望有帮助