给出一个示例表df
,如下所示,如何计算TIME1, TIME2, TIME3.
的平均日期
df['AVG_TIME'] = df[['TIME1', 'TIME2', 'TIME3']].mean(axis=1)
这将返回NaN
个值
ID TIME1 TIME2 TIME3
0 2018-07-11 2018-07-09 2018-07-12
1 2018-07-12 2018-06-12 2018-07-15
2 2018-07-13 2018-06-13 2018-08-03
3 2019-09-11 2019-08-11 2019-09-01
4 2019-09-12 2019-08-12 2019-09-15
答案 0 :(得分:0)
这可以按照以下步骤进行:
import time
import datetime
import pandas as pd
# build the df
c = ['TIME1' , 'TIME2' , 'TIME3']
d = [['2018-07-11', '2018-07-09', '2018-07-12'],
['2018-07-12', '2018-06-12', '2018-07-15'],
['2018-07-13', '2018-06-13', '2018-08-03'],
['2019-09-11', '2019-08-11', '2019-09-01'],
['2019-09-12', '2019-08-12', '2019-09-15']]
df = pd.DataFrame(d, columns=c)
# conversion from dates to seconds since epoch (unix time)
def to_unix(s):
return time.mktime(datetime.datetime.strptime(s, "%Y-%m-%d").timetuple())
# sum the seconds since epoch, calculate average, and convert back to readable date
averages = []
for index, row in df.iterrows():
unix = [to_unix(i) for i in row]
average = sum(unix) / len(unix)
averages.append(datetime.datetime.utcfromtimestamp(average).strftime('%Y-%m-%d'))
df['averages'] = averages