我正在串联整个大型数据集中的两个相关字段。我觉得我已拥有大部分需求,但无法正确地接洽各个领域。
数据帧:
id| date1foo| time1bar| date2foo| time2bar| date3foo | time3bar
--|---------|---------|---------|---------|----------|--------
2 |1/4/2017 |01:03:45 |1/4/2017 |01:03:45 |1/4/2019 |12:44:45
3 |2/4/2017 |03:12:32 |2/4/2017 |03:16:23 |3/4/2019 |22:32:55
4 |2/5/2017 |04:11:54 |7/5/2017 |06:23:31 |2/19/2019 |19:03:11
5 |2/6/2017 |02:15:34 |9/15/2017|01:12:32 |3/15/2019 |11:11:11
6 |3/17/2017|04:44:12 |10/3/2017|07:19:52 |4/4/2019 |07:03:14
我需要用新的合并字段替换这些字段。因此:
id| datetime1 | datetime2 | datetime3
--|------------------|------------------|------------------|
2 |1/4/2017 01:03:45 |1/4/2017 01:03:45 |1/4/2019 12:44:45
3 |2/4/2017 03:12:32 |2/4/2017 03:16:23 |3/4/2019 22:32:55
4 |2/5/2017 04:11:54 |7/5/2017 06:23:31 |2/19/2019 19:03:11
5 |2/6/2017 02:15:34 |9/15/2017 01:12:32|3/15/2019 11:11:11
6 |3/17/2017 04:44:12|10/3/2017 07:19:52|4/4/2019 07:03:14
我觉得自己与下面的内容越来越接近。
代码:
pattern_date = re.compile("date[0-9]{1,2}foo")
pattern_time = re.compile("time[0-9]{1,2}bar")
cols_date = [pattern_date.match(x).group() for x in df.columns if
pattern_date.match(x) is not None]
cols_time = [pattern_time.match(x).group() for x in df.columns if
pattern_time.match(x) is not None]
df[cols_time] = df[cols_date].applymap(lambda x: str(x) + [i for i in df[cols_date]])
# renaming fields code would go here
我在这里想念什么?有一个更好的方法吗?任何帮助将非常感激。
谢谢!
答案 0 :(得分:1)
我们可以使用DatFrame.filter
并通过压缩它们来访问这些列,以便我们匹配date
和time
:
df_new = pd.DataFrame({'id':df.id.values})
for index, cols in enumerate(zip(df.filter(regex='^date').columns, df.filter(regex='^time').columns)):
df_new[f'datetime{index+1}'] = df[cols[0]] + ' ' + df[cols[1]]
print(df_new)
id datetime1 datetime2 datetime3
0 2 1/4/2017 01:03:45 1/4/2017 01:03:45 1/4/2019 12:44:45
1 3 2/4/2017 03:12:32 2/4/2017 03:16:23 3/4/2019 22:32:55
2 4 2/5/2017 04:11:54 7/5/2017 06:23:31 2/19/2019 19:03:11
3 5 2/6/2017 02:15:34 9/15/2017 01:12:32 3/15/2019 11:11:11
4 6 3/17/2017 04:44:12 10/3/2017 07:19:52 4/4/2019 07:03:14
DataFrame.filter
到底是做什么的?它返回与正则表达式匹配的列:
print(df.filter(regex='^date'))
date1foo date2foo date3foo
0 1/4/2017 1/4/2017 1/4/2019
1 2/4/2017 2/4/2017 3/4/2019
2 2/5/2017 7/5/2017 2/19/2019
3 2/6/2017 9/15/2017 3/15/2019
4 3/17/2017 10/3/2017 4/4/2019
print(df.filter(regex='^time'))
time1bar time2bar time3bar
0 01:03:45 01:03:45 12:44:45
1 03:12:32 03:16:23 22:32:55
2 04:11:54 06:23:31 19:03:11
3 02:15:34 01:12:32 11:11:11
4 04:44:12 07:19:52 07:03:14
注意:我使用的f-strings
仅受Python> 3.5支持。如果您的Python版本较低,请使用以下命令:
df_new['datetime{}'.format(index+1)]