Question

这一切我都是新手，所以请不要对我苛刻。因此，我正在使用Pandas进行数据分析。我有一个有时间的CSV文件。老实说，我从来没有见过时间代表这种方式。所以这是文件的样子

我正在尝试计算所有这些行的平均时间。任何形式的反馈将不胜感激

import pandas as pd
import pandas as np
from datetime import datetime


flyer = pd.read_csv("./myfile.csv",parse_dates = ['timestamp'])

flyer.dropna(axis=0, how='any', thresh=None, subset=None, inplace=True)

pd.set_option('display.max_rows', 20)

flyer['timestamp'] = pd.to_datetime(flyer['timestamp'], 
infer_datetime_format=True)
p = flyer.loc[:,'timestamp'].mean()


print(flyer['timestamp'].mean())

Answer 1

当您使用熊猫读取csv时，请将parse_dates = ['timestamp']添加到pd.read_csv()函数调用中，它将正确地读入该列中。时间戳字段中的T是分隔日期和时间的常用方法。

-4：00表示时区信息，在这种情况下，与UTC时间相比，它表示-4：00小时。

关于计算平均时间，这可能会有些棘手，但这是导入csv之后的一种解决方案。

from datetime import datetime

pd.to_datetime(datetime.fromtimestamp(pd.to_timedelta(df['timestamp'].mean().total_seconds())))

这是将字段转换为日期时间对象以计算平均值，然后获取总秒数（EPOCH时间），然后使用该秒数将其转换回熊猫日期时间序列。

Answer 2

以上内容是正确的，但是如果您是新手，可能不清楚0x正在为您供电。

import pandas as pd

# turn your csv into a pandas dataframe
df = pd.read_csv('your/file/location.csv')

时间戳列可能被解释为一堆字符串，您将无法对字符串进行数学运算。

# this forces the column's data into timestamp variables
df['timestamp'] = pd.to_datetime(df['timestamp'], infer_datetime_format=True)

# now for your answer, get the average of the timestamp column
print(df['timestamp'].mean())

如何计算python pandas中的平均时间

2 个答案: