我计算了每天我得到的刻度数据文件,我准备每天附加到表格中。最终的DataFrame应显示股票代码,平均价差,最大价差和日期。除了显示为空的日期列之外,其他所有工作正常。
tick数据本身有一个名为' timestamp'以例如格式显示时间2016-06-03T14:27:16.548084-4:00。 我只需要日期(2016-06-03),对于我运行此脚本的每个文件,每行应该相同,因为每个文件都是一天。只有时间不同。
最终结果如下:
a | 0.22 | 1.84 | 2016-06-03
aa | 0.01 | 0.10 | 2016-06-03
aaap | 2.07 | 2.17 | 2016-06-03
aal | 0.15 | 0.5 | 2016-06-03
我已尝试使用dtype str和df2['date'] = df['timestamp'].head(1) * len(df2.index)
使用相同的结果,即空日期列。我哪里错了?
import pandas as pd
import numpy as np
from datetime import datetime
df = pd.read_csv('C:\\Users\\tickdata.csv',
dtype={'ticker': str, 'timestamp': datetime, 'bidPrice': np.float32, 'askPrice': np.float32, 'afterHours': str},
usecols=['ticker', 'timestamp', 'bidPrice', 'askPrice', 'afterHours']
)
#afterhours and single sided quotes need to be filtered out
#create the spread column to analyze
df = df[df.afterHours == "False"]
df = df[df.bidPrice != 0]
df = df[df.askPrice != 0]
df['spread'] = (df.askPrice - df.bidPrice)
#compute the average and max to a seperate DataFrame
#grab the date from the first row
df2 = pd.DataFrame()
df2['avg_spread'] = df.groupby(['ticker'])['spread'].mean()
df2['maximum'] = df.groupby(['ticker'])['spread'].max()
df2['date'] = df['timestamp'].head(1)
更新:
import pandas as pd
import numpy as np
import psycopg2 as pg
import datetime as dt
df = pd.read_csv('C:\\Users\\tickdata.csv',
dtype={'ticker': str, 'timestamp': str, 'bidPrice': np.float32, 'askPrice': np.float32, 'afterHours': str},
usecols=['ticker', 'timestamp', 'bidPrice', 'askPrice', 'afterHours'],
)
#afterhours and single sided quotes need to be filtered out
#create the spread column to analyze
df = df[df.afterHours == "False"]
df = df[df.bidPrice != 0]
df = df[df.askPrice != 0]
df['spread'] = (df.askPrice - df.bidPrice)
#convert timestamp to date
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['date'] = df.timestamp.dt.date
#compute the average and max to a seperate DataFrame
#grab the date from the first row
df2 = pd.DataFrame()
df2['avg_spread'] = df.groupby(['ticker'])['spread'].mean()
df2['maximum'] = df.groupby(['ticker'])['spread'].max()
df2['date'] = df.groupby(['ticker'])['date']
现在试图找出如何在df2中显示日期。试过df2['date'] = df.groupby(['ticker'])['date']
和
df2['date'] = df['date']
更新2 [已解决]
需要使用
df2['date'] = df.groupby(['ticker'])['date'].first()
答案 0 :(得分:0)
使用to_datetime转换为datetime,使用dt.date获取日期字符串。
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['date'] = df.timestamp.dt.date