我有一个由五列组成的数据框:唯一ID,YEAR-WEEK_nb类型的事件列表,列表的长度,min_date和max_date。
我想做的是计算min_date和max_date之间的距离(以周为单位)。例如,在第一行中,我们将有0000-8,持续8周。
我创建了两个函数:
def min_date(date):
return min(date)
def max_date(date):
return
week["date_min"] = week["week_year"].apply(min_date)
week["date_max"] = week["week_year"].apply(max_date)
我认为我应该做的是
def max_minus_min_date(date):
return max(date) - min(date)
但是我不知道如何将字符串转换为我的时间戳类型。
您可以在数据集下面找到它作为字典:
dico= {'ID': {0: '3014-4298-ae43-14-4298-4298-a',
1: '002445cc-a38d-5c25bb06e4c47406c',
2: '4d1e-a5559-000b0601f33b1d',
3: '002445cc-00305e69-7c76a-77b32aba5dec'},
'year_week': {0: ['2018-15', '2018-16', '2018-23'],
1: ['2019-39', '2019-40', '2019-41', '2019-42', '2019-43'],
2: ['2018-1',
'2018-12',
'2018-2',
'2018-23',
'2018-24',
'2018-25',
'2018-26',
'2018-27',
'2018-3',
'2018-36',
'2018-38',
'2018-39',
'2018-4',
'2018-40',
'2018-41',
'2018-42',
'2018-45',
'2018-47',
'2018-48',
'2018-49',
'2018-50',
'2018-51',
'2018-6',
'2018-7',
'2018-8',
'2019-12',
'2019-13',
'2019-15',
'2019-16',
'2019-17',
'2019-18',
'2019-20',
'2019-21',
'2019-22',
'2019-23',
'2019-24',
'2019-25',
'2019-26',
'2019-27',
'2019-28',
'2019-29',
'2019-3',
'2019-30',
'2019-31',
'2019-32',
'2019-33',
'2019-34',
'2019-35',
'2019-36',
'2019-37',
'2019-38',
'2019-4',
'2019-5',
'2019-6',
'2019-7',
'2019-8'],
3: ['2018-36',
'2018-38',
'2018-39',
'2018-40',
'2018-41',
'2018-42',
'2018-43',
'2018-44',
'2018-45',
'2018-46',
'2018-47',
'2018-48',
'2018-49',
'2018-52',
'2019-10',
'2019-11',
'2019-2',
'2019-3',
'2019-4',
'2019-5',
'2019-6',
'2019-7',
'2019-8',
'2019-9']},
'length_list': {0: 3, 1: 5, 2: 56, 3: 24},
'date_min': {0: '2018-15', 1: '2019-39', 2: '2018-1', 3: '2018-36'},
'date_max': {0: '2018-23', 1: '2019-43', 2: '2019-8', 3: '2019-9'}}
答案 0 :(得分:4)
IIUC,您需要将星期数和年份转换为日期时间
首先让我们这样做,然后是周差异。
date_format = '%Y-%W-%w'
s1 = pd.to_datetime((df['date_min'] + '-0'),format=date_format)
s2 = pd.to_datetime((df['date_max'] + '-0'),format=date_format)
week_diff = (s2 - s1) / np.timedelta64(1,'W')
df['week_diff'] = week_diff
print(df[['date_min','date_max','week_diff']])
date_min date_max week_diff
0 2018-15 2018-23 8.0
1 2019-39 2019-43 4.0
2 2018-1 2019-8 60.0
3 2018-36 2019-9 26.0
答案 1 :(得分:2)
从字符串中提取星期的最快方法如下:
import re
re_week = re.compile("(?P<year>\d{4})\-(?P<week>\d{1,2})")
# captures both 2019-1 and 2019-42 (one and two digit week number)
def extract_year_week(datestr):
mtch = re_week.match(datestr)
return int(mtch.group('year')), int(mtch.group('week'))
一旦有了它,就可以使用Datetime from year and week number的转换来修改最小/最大函数:
def max_minus_min_date(date):
dt_conv = [extract_year_week(dt) for dt in date]
sdt = dt_conv.sort(key=lambda yw: yw[0] + yw[1])
maxd = datetime.strptime('{0} {1} 0'.format(*sdt[-1]), "%Y %W %w")
mind = datetime.strptime('{0} {1} 0'.format(*sdt[0]), "%Y %W %w")
return (maxd - mind).days / 7
答案 2 :(得分:0)
尝试一下:
首先格式化日期
import datetime
mx = '2019-5'
mn = '2019-3'
mxDate = datetime.datetime.strptime(mx + '-1', "%Y-%W-%w")
mnDate = datetime.datetime.strptime(mn + '-1', "%Y-%W-%w")
然后计算周数差异
weeks = (mxDate - mnDate ).days / 7
print(weeks)