Question

我计算了每天我得到的刻度数据文件，我准备每天附加到表格中。最终的DataFrame应显示股票代码，平均价差，最大价差和日期。除了显示为空的日期列之外，其他所有工作正常。

tick数据本身有一个名为＆＃39; timestamp＆＃39;以例如格式显示时间2016-06-03T14：27：16.548084-4：00。 我只需要日期（2016-06-03），对于我运行此脚本的每个文件，每行应该相同，因为每个文件都是一天。只有时间不同。

最终结果如下：

a    | 0.22 | 1.84 | 2016-06-03
aa   | 0.01 | 0.10 | 2016-06-03
aaap | 2.07 | 2.17 | 2016-06-03
aal  | 0.15 | 0.5  | 2016-06-03

我已尝试使用dtype str和df2['date'] = df['timestamp'].head(1) * len(df2.index)使用相同的结果，即空日期列。我哪里错了？

import pandas as pd
import numpy as np
from datetime import datetime


df = pd.read_csv('C:\\Users\\tickdata.csv',
                 dtype={'ticker': str, 'timestamp': datetime, 'bidPrice': np.float32, 'askPrice': np.float32, 'afterHours': str},
                 usecols=['ticker', 'timestamp', 'bidPrice', 'askPrice', 'afterHours']
                 )

#afterhours and single sided quotes need to be filtered out
#create the spread column to analyze
df = df[df.afterHours == "False"]
df = df[df.bidPrice != 0]
df = df[df.askPrice != 0]
df['spread'] = (df.askPrice - df.bidPrice)

#compute the average and max to a seperate DataFrame
#grab the date from the first row
df2 = pd.DataFrame()
df2['avg_spread'] = df.groupby(['ticker'])['spread'].mean()
df2['maximum'] = df.groupby(['ticker'])['spread'].max()
df2['date'] = df['timestamp'].head(1)

更新：

import pandas as pd
import numpy as np
import psycopg2 as pg
import datetime as dt


df = pd.read_csv('C:\\Users\\tickdata.csv',
                 dtype={'ticker': str, 'timestamp': str, 'bidPrice': np.float32, 'askPrice': np.float32, 'afterHours': str},
                 usecols=['ticker', 'timestamp', 'bidPrice', 'askPrice', 'afterHours'],
                 )
#afterhours and single sided quotes need to be filtered out
#create the spread column to analyze
df = df[df.afterHours == "False"]
df = df[df.bidPrice != 0]
df = df[df.askPrice != 0]
df['spread'] = (df.askPrice - df.bidPrice)

#convert timestamp to date
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['date'] = df.timestamp.dt.date

#compute the average and max to a seperate DataFrame
#grab the date from the first row
df2 = pd.DataFrame()
df2['avg_spread'] = df.groupby(['ticker'])['spread'].mean()
df2['maximum'] = df.groupby(['ticker'])['spread'].max()
df2['date'] = df.groupby(['ticker'])['date']

现在试图找出如何在df2中显示日期。试过df2['date'] = df.groupby(['ticker'])['date'] 和

df2['date'] = df['date']

更新2 [已解决] 需要使用 df2['date'] = df.groupby(['ticker'])['date'].first()

Answer 1

使用to_datetime转换为datetime，使用dt.date获取日期字符串。

df['timestamp'] = pd.to_datetime(df['timestamp'])
df['date'] = df.timestamp.dt.date

为什么我的日期列显示为空？

1 个答案: