我目前正在使用Android应用程序收集时间戳,对于某些用户存储时区,例如“GMT + 03:00”在线浏览我发现这不是一个合适的时区,因此在尝试构建日期时python中的对象使用
from datetime import datetime
from dateutil import tz
import pandas as pd
filename="data.csv"
data=pd.read_csv(filename)
[ datetime.fromtimestamp(data['timestamp'].iloc[i],
tz=tz.gettz(data['timezone'].iloc[i]))
for i in range(data.shape[0]) ]
效果不佳。例如,使用该datetime对象作为索引来创建Pandas数据帧以便使用滚动窗口功能不起作用。知道如何将“GMT + 03:00”转换为适当的时区或某种方式来合并该信息以正确构建日期时间对象吗?
更新:
以下是data['timestamps']
的示例:
[1520719558.0, 1520719558.0, 1520719558.0, 1520719558.0, 1520719561.0, 1520719561.0, 1520719561.0, 1520719561.0, 1520719562.0, 1520719562.0]
以及data['timezone']
的样本:
['GMT+03:00', 'GMT+03:00', 'GMT+03:00', 'GMT+03:00', 'GMT+03:00', 'GMT+03:00', 'GMT+03:00', 'GMT+03:00', 'GMT+03:00', 'GMT+03:00']
答案 0 :(得分:2)
GMT和UTC是相同的。您可以手动 :编写一个函数来提取偏移并返回datetime.timezone
。
import datetime, re
def get_tz(s):
'''Returns a datetime.timezone object.
Uses regular expression to extract the UTC offset from s.
Assumes s is in the form of "GMT+03:00" or "GMT-03:00".
Does NOT have exception handling.
'''
pattern = r'GMT([+-])(\d{1,2}):(\d{1,2})'
match = re.match(pattern, s)
sign, hh, mm = match.groups()
hh, mm = map(int, (hh, mm))
t_delta = datetime.timedelta(hours=hh, minutes=mm)
t_delta = t_delta * (1 if sign == '+' else -1)
return datetime.timezone(t_delta)
用法:
>>> timestamp = 1520719558.0
>>> timezone = 'GMT+03:00'
>>> dt = datetime.datetime.fromtimestamp(timestamp, get_tz(timezone))
>>> dt.isoformat()
'2018-03-11T01:05:58+03:00'
>>> timezone = 'GMT-03:00'
>>> dt = datetime.datetime.fromtimestamp(timestamp, get_tz(timezone))
>>> dt.isoformat()
'2018-03-10T19:05:58-03:00'
答案 1 :(得分:1)
#!/usr/bin/python3.5
import pandas as pd
import re
import datetime as dt
# From wwii solution
def get_tz(s):
'''Returns a datetime.timezone object.
Uses regular expression to extract the UTC offset from s.
Assumes s is in the form of "GMT+03:00" or "GMT-03:00".
Does NOT have exception handling.
'''
pattern = r'GMT([+-])(\d{1,2}):(\d{1,2})'
match = re.match(pattern, s)
sign, hh, mm = match.groups()
hh, mm = map(int, (hh, mm))
t_delta = dt.timedelta(hours=hh, minutes=mm)
t_delta = t_delta * (1 if sign == '+' else -1)
return dt.timezone(t_delta)
timestamps = [1520719558.0, 1520719558.0, 1520719558.0, 1520719558.0,
1520719561.0, 1520719561.0, 1520719561.0, 1520719561.0,
1520719562.0, 1520719562.0]
timezones = ['GMT+03:00', 'GMT+03:00', 'GMT+03:00', 'GMT+03:00',
'GMT+03:00', 'GMT+03:00', 'GMT+03:00', 'GMT+03:00',
'GMT+03:00', 'GMT+03:00']
data = zip(timestamps, timezones)
data_df = pd.DataFrame(list(data), columns=['timestamp', 'timezone'])
# Converts timezone to date object
data_df['timezone'] = data_df['timezone'].apply(lambda x:get_tz(x))
# Adding a new-column in the dataframe with the datetime format requested
data_df['date_time'] = [dt.datetime.fromtimestamp(row['timestamp'], row['timezone'])
for (_, row) in data_df[['timestamp', 'timezone']].iterrows()
]