如何使用pandas检查每月刻度数据csv中的每日刻度数据?

时间:2018-05-12 18:36:02

标签: python pandas csv

我有每月的滴答数据要分析,如下所示:

Time (UTC),Ask,Bid,AskVolume,BidVolume
2007.04.01 21:00:47.593,95.203,95.159,19.1,8.8
2007.04.01 21:00:47.968,95.174,95.124,23.9,9.2
2007.04.01 21:01:02.695,95.132,95.092,4,4
2007.04.01 21:01:05.934,95.154,95.104,11.2,4
2007.04.01 21:01:18.430,95.171,95.131,12,5.2
2007.04.01 21:01:19.957,95.188,95.153,8,9.2
2007.04.01 21:01:56.308,95.208,95.148,22.3,4
2007.04.01 21:01:57.233,95.192,95.152,7.2,9.2
2007.04.01 21:01:57.443,95.188,95.143,7.2,9.2
2007.04.01 21:01:59.691,95.184,95.139,7.2,9.2
2007.04.01 21:01:59.934,95.181,95.141,8,3.9
2007.04.01 21:02:10.569,95.171,95.136,11.9,4
2007.04.01 21:02:20.708,95.166,95.126,11.2,8.8
2007.04.01 21:02:35.211,95.17,95.135,21.5,4
2007.04.01 21:02:39.946,95.196,95.156,7.2,8.8
2007.04.01 21:02:40.206,95.224,95.164,0.8,0.8
2007.04.01 21:02:43.600,95.222,95.177,8,9.2
2007.04.01 21:02:54.578,95.216,95.186,25.5,5.2
2007.04.01 21:03:04.811,95.23,95.18,7.9,7.9

一直到月的最后一天。

每当当天的卖价变化百分比((最大 - 最小)/最大值)大于0.05时,我需要知道这一天。我的方法是逐日分离数据,并计算变化百分比,以查看当天价格是否下降超过5%,如果确实如此,则返回当天。我是大熊猫的新手,到目前为止我已经拥有了:

import pandas as pd

df = pd.read_csv('AUDJPY_Ticks_2007.04.01_2007.04.30.csv')
percentChange = ((df['Ask'].max() - df['Ask'].min()) / df['Ask'].max()) >= 0.05
print(percentChange)

我只能获得整个月的百分比变化,而不是每天。

1 个答案:

答案 0 :(得分:0)

resampletransform

的解决方案

数据:

我修改了您的示例数据,以便让我们的测试用例可以查看多天,并且至少有一天的变化大于0.05%。此外,下一个读取此内容的人将拥有可复制和粘贴的可重复示例。

import pandas as pd
from io import StringIO

test_data = StringIO("""Time (UTC),Ask,Bid,AskVolume,BidVolume
2007.04.01 21:00:47.593,95.203,95.159,19.1,8.8
2007.04.01 21:00:47.968,95.174,95.124,23.9,9.2
2007.04.01 21:01:02.695,95.132,95.092,4,4
2007.04.01 21:01:05.934,95.154,95.104,11.2,4
2007.04.02 21:01:18.430,95.171,95.131,12,5.2
2007.04.02 21:01:19.957,95.188,95.153,8,9.2
2007.04.02 21:01:56.308,95.208,95.148,22.3,4
2007.04.02 21:01:57.233,95.192,95.152,7.2,9.2
2007.04.03 21:01:57.443,91.188,95.143,7.2,9.2
2007.04.03 21:01:59.691,97.684,95.139,7.2,9.2 
2007.04.03 21:01:59.934,95.181,95.141,8,3.9
2007.04.03 21:02:10.569,95.171,95.136,11.9,4
2007.04.04 21:02:20.708,95.166,95.126,11.2,8.8
2007.04.04 21:02:35.211,95.17,95.135,21.5,4
2007.04.04 21:02:39.946,95.196,95.156,7.2,8.8
2007.04.04 21:02:40.206,95.224,95.164,0.8,0.8
2007.04.05 21:02:43.600,95.222,95.177,8,9.2
2007.04.05 21:02:54.578,95.216,95.186,25.5,5.2
2007.04.05 21:03:04.811,95.23,95.18,7.9,7.9""")

df = pd.read_table(test_data, sep=",", header=[0], parse_dates=["Time (UTC)"])


将索引设置为datetime列:

df.set_index("Time (UTC)", drop=True, inplace=True)


重新采样和转换:

daily_ask = df.resample("D")["Ask"]
df["daily_ask_min"] = daily_ask.transform("min")
df["daily_ask_max"] = daily_ask.transform("max")


计算每日变化百分比:

df["daily_ask_change"] = (df["daily_ask_max"] - df["daily_ask_min"]) / df["daily_ask_max"]


查找大于0.05%的更改:

df[df.daily_ask_change > 0.05]["daily_ask_change"]

# Time (UTC)
# 2007-04-03 21:01:57.443    0.0665
# 2007-04-03 21:01:59.691    0.0665
# 2007-04-03 21:01:59.934    0.0665
# 2007-04-03 21:02:10.569    0.0665
# Name: daily_ask_change, dtype: float64


df[df.daily_ask_change > 0.05]["daily_ask_change"].resample("D").mean()

# Time (UTC)
# 2007-04-03    0.0665
# Freq: D, Name: daily_ask_change, dtype: float64