我有每月的滴答数据要分析,如下所示:
Time (UTC),Ask,Bid,AskVolume,BidVolume
2007.04.01 21:00:47.593,95.203,95.159,19.1,8.8
2007.04.01 21:00:47.968,95.174,95.124,23.9,9.2
2007.04.01 21:01:02.695,95.132,95.092,4,4
2007.04.01 21:01:05.934,95.154,95.104,11.2,4
2007.04.01 21:01:18.430,95.171,95.131,12,5.2
2007.04.01 21:01:19.957,95.188,95.153,8,9.2
2007.04.01 21:01:56.308,95.208,95.148,22.3,4
2007.04.01 21:01:57.233,95.192,95.152,7.2,9.2
2007.04.01 21:01:57.443,95.188,95.143,7.2,9.2
2007.04.01 21:01:59.691,95.184,95.139,7.2,9.2
2007.04.01 21:01:59.934,95.181,95.141,8,3.9
2007.04.01 21:02:10.569,95.171,95.136,11.9,4
2007.04.01 21:02:20.708,95.166,95.126,11.2,8.8
2007.04.01 21:02:35.211,95.17,95.135,21.5,4
2007.04.01 21:02:39.946,95.196,95.156,7.2,8.8
2007.04.01 21:02:40.206,95.224,95.164,0.8,0.8
2007.04.01 21:02:43.600,95.222,95.177,8,9.2
2007.04.01 21:02:54.578,95.216,95.186,25.5,5.2
2007.04.01 21:03:04.811,95.23,95.18,7.9,7.9
一直到月的最后一天。
每当当天的卖价变化百分比((最大 - 最小)/最大值)大于0.05时,我需要知道这一天。我的方法是逐日分离数据,并计算变化百分比,以查看当天价格是否下降超过5%,如果确实如此,则返回当天。我是大熊猫的新手,到目前为止我已经拥有了:
import pandas as pd
df = pd.read_csv('AUDJPY_Ticks_2007.04.01_2007.04.30.csv')
percentChange = ((df['Ask'].max() - df['Ask'].min()) / df['Ask'].max()) >= 0.05
print(percentChange)
我只能获得整个月的百分比变化,而不是每天。
答案 0 :(得分:0)
resample
和transform
:我修改了您的示例数据,以便让我们的测试用例可以查看多天,并且至少有一天的变化大于0.05%。此外,下一个读取此内容的人将拥有可复制和粘贴的可重复示例。
import pandas as pd
from io import StringIO
test_data = StringIO("""Time (UTC),Ask,Bid,AskVolume,BidVolume
2007.04.01 21:00:47.593,95.203,95.159,19.1,8.8
2007.04.01 21:00:47.968,95.174,95.124,23.9,9.2
2007.04.01 21:01:02.695,95.132,95.092,4,4
2007.04.01 21:01:05.934,95.154,95.104,11.2,4
2007.04.02 21:01:18.430,95.171,95.131,12,5.2
2007.04.02 21:01:19.957,95.188,95.153,8,9.2
2007.04.02 21:01:56.308,95.208,95.148,22.3,4
2007.04.02 21:01:57.233,95.192,95.152,7.2,9.2
2007.04.03 21:01:57.443,91.188,95.143,7.2,9.2
2007.04.03 21:01:59.691,97.684,95.139,7.2,9.2
2007.04.03 21:01:59.934,95.181,95.141,8,3.9
2007.04.03 21:02:10.569,95.171,95.136,11.9,4
2007.04.04 21:02:20.708,95.166,95.126,11.2,8.8
2007.04.04 21:02:35.211,95.17,95.135,21.5,4
2007.04.04 21:02:39.946,95.196,95.156,7.2,8.8
2007.04.04 21:02:40.206,95.224,95.164,0.8,0.8
2007.04.05 21:02:43.600,95.222,95.177,8,9.2
2007.04.05 21:02:54.578,95.216,95.186,25.5,5.2
2007.04.05 21:03:04.811,95.23,95.18,7.9,7.9""")
df = pd.read_table(test_data, sep=",", header=[0], parse_dates=["Time (UTC)"])
df.set_index("Time (UTC)", drop=True, inplace=True)
daily_ask = df.resample("D")["Ask"]
df["daily_ask_min"] = daily_ask.transform("min")
df["daily_ask_max"] = daily_ask.transform("max")
df["daily_ask_change"] = (df["daily_ask_max"] - df["daily_ask_min"]) / df["daily_ask_max"]
df[df.daily_ask_change > 0.05]["daily_ask_change"]
# Time (UTC)
# 2007-04-03 21:01:57.443 0.0665
# 2007-04-03 21:01:59.691 0.0665
# 2007-04-03 21:01:59.934 0.0665
# 2007-04-03 21:02:10.569 0.0665
# Name: daily_ask_change, dtype: float64
df[df.daily_ask_change > 0.05]["daily_ask_change"].resample("D").mean()
# Time (UTC)
# 2007-04-03 0.0665
# Freq: D, Name: daily_ask_change, dtype: float64