使函数根据条件返回长度

时间:2020-05-03 15:19:21

标签: python pandas pandas-groupby

我有2个数据框-1个包含股票行情自动收录器和一个最大/最小价格范围以及其他列。

另一个DataFrame具有日期作为索引,并通过带有不同指标(如开盘,收盘,高低等)的报价进行分组。现在,我想从此DataFrame中计算几天,对于给定的股票,收盘价为高于最低价格。

我被困在这里:现在,我想查找例如AMZN交易低于时段最高价格的天数。

我想根据第一个数据框的值来计算第二个数据框的天数,即收盘价小于/大于期满/最小期间价格的天数。

我已经添加了用于重现DataFrame的代码。

请检查屏幕截图。 First DataFrame Second DataFrame

import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta
import yfinance as yf

start=datetime.datetime.today()-relativedelta(years=2)
end=datetime.datetime.today()

us_stock_list='FB AMZN BABA'
data_metric = yf.download(us_stock_list, start=start, end=end,group_by='column',auto_adjust=True)
data_ticker= yf.download(us_stock_list, start=start, end=end,group_by='ticker',auto_adjust=True)

stock_list=[stock for stock in data_ticker.stack()]

# max_price
max_values=pd.DataFrame(data_ticker.max().unstack()['High'])
# min_price
min_values=pd.DataFrame(data_ticker.min().unstack()['Low'])


# latest_price
latest_day=pd.DataFrame(data_ticker.tail(1).unstack())
latest_day=latest_day.unstack().unstack().unstack().reset_index()

# latest_day=latest_day.unstack().reset_index()
latest_day=latest_day.drop(columns=['level_0','Date'])
latest_day.set_index('level_3',inplace=True)

latest_day.rename(columns={0:'Values'},inplace=True)

latest_day=latest_day.groupby(by=['level_3','level_2']).max().unstack()

latest_day.columns=[ '_'.join(x) for x in latest_day.columns ]

latest_day=latest_day.join(max_values,how='inner')

latest_day=latest_day.join(min_values,how='inner')

latest_day.rename(columns={'High':'Period_High_Max','Low':'Period_Low_Min'},inplace=True)

close_price_data=pd.DataFrame(data_metric['Close'].unstack().reset_index())
close_price_data= close_price_data.rename(columns={'level_0':'Stock',0:'Close_price'})
close_price_data.set_index('Stock',inplace=True)

使用它来复制:

{"Values_Close":{"AMZN":2286.0400390625,"BABA":194.4799957275,"FB":202.2700042725},"Values_High":{"AMZN":2362.4399414062,"BABA":197.3800048828,"FB":207.2799987793},"Values_Low":{"AMZN":2258.1899414062,"BABA":192.8600006104,"FB":199.0500030518},"Values_Open":{"AMZN":2336.8000488281,"BABA":195.75,"FB":201.6000061035},"Values_Volume":{"AMZN":9754900.0,"BABA":22268800.0,"FB":30399600.0},"Period_High_Max":{"AMZN":2475.0,"BABA":231.1399993896,"FB":224.1999969482},"Period_Low_Min":{"AMZN":1307.0,"BABA":129.7700042725,"FB":123.0199966431},"%_Position":{"AMZN":0.8382192115,"BABA":0.6383544892,"FB":0.7832576338}}


{"Stock":{
  "0":"AMZN",
  "1":"AMZN",
  "2":"AMZN",
  "3":"AMZN",
  "4":"AMZN",
  "5":"AMZN",
  "6":"AMZN",
  "7":"AMZN",
  "8":"AMZN",
  "9":"AMZN",
  "10":"AMZN",
  "11":"AMZN",
  "12":"AMZN",
  "13":"AMZN",
  "14":"AMZN",
  "15":"AMZN",
  "16":"AMZN",
  "17":"AMZN",
  "18":"AMZN",
  "19":"AMZN"},
"Date":{
  "0":1525305600000,
  "1":1525392000000,
  "2":1525651200000,
  "3":1525737600000,
  "4":1525824000000,
  "5":1525910400000,
  "6":1525996800000,
  "7":1526256000000,
  "8":1526342400000,
  "9":1526428800000,
  "10":1526515200000,
  "11":1526601600000,
  "12":1526860800000,
  "13":1526947200000,
  "14":1527033600000,
  "15":1527120000000,
  "16":1527206400000,
  "17":1527552000000,
  "18":1527638400000,
  "19":1527724800000 },
"Close_price":{
  "0":1572.0799560547,
  "1":1580.9499511719,
  "2":1600.1400146484,
  "3":1592.3900146484,
  "4":1608.0,
  "5":1609.0799560547,
  "6":1602.9100341797,
  "7":1601.5400390625,
  "8":1576.1199951172,
  "9":1587.2800292969,
  "10":1581.7600097656,
  "11":1574.3699951172,
  "12":1585.4599609375,
  "13":1581.4000244141,
  "14":1601.8599853516,
  "15":1603.0699462891,
  "16":1610.1500244141,
  "17":1612.8699951172,
  "18":1624.8900146484,
  "19":1629.6199951172}}

1 个答案:

答案 0 :(得分:0)

merge公司(索引level=0)和groupby这两个数据帧之间做一个apply:自定义函数:

df_merge = close_price_data.merge(
    latest_day[['Period_High_Max', 'Period_Low_Min']],
    left_index=True,
    right_index=True)

def fun(df):
    d = {}
    d['days_above_min'] = (df.Close_price > df.Period_Low_Min).sum()
    d['days_below_max'] = (df.Close_price < df.Period_High_Max).sum()

    return pd.Series(d)

df_merge.groupby(level=0).apply(fun)

Period_Low_MinPeriod_High_Max分别是最小值和最大值,因此所有收盘价都将在该范围内,如果这不是您要达到的目标,请告诉我。

相关问题