我有以下2个df:
import itertools
import operator
def combinations_with_replacement(iterable, r):
# combinations_with_replacement('ABC', 2) --> AA AB AC BB BC CC
pool = tuple(iterable)
n = len(pool)
if not n and r:
return
indices = [0] * r
yield tuple(pool[i] for i in indices)
while True:
for i in reversed(range(r)):
if indices[i] != n - 1:
break
else:
return
indices[i:] = [indices[i] + 1] * (r - i)
yield tuple(pool[i] for i in indices)
int_part = lambda n, k: (tuple(map(c.count, range(k))) for c in combinations_with_replacement(range(k), n))
for item in int_part(3,4): print(item)
和
# -- create a df1
list_columns = ['time', 'code', 'age']
list_data = [
['2019-11-18 10:33:53', 'a1', 10],
['2019-11-18 11:33:56', 'a2', 15],
['2019-11-18 12:33:58', 'a4', 6],
['2019-11-18 13:45:04', 'a5', 3]
]
df1 = pd.DataFrame(columns=list_columns, data=list_data)
不知道是否可以在熊猫中使用,但如果df1中的时间是否在df1中的 start_time 和 end_time 之间,我想在年龄之后添加来自df2的名称和国家列。有点像联接。
对于日期,我将使用这样的蒙版:
# -- create a df2
list_columns = ['start_time','end_time','name', 'country']
list_data = [
['2019-11-18 10:31:53','2019-11-18 10:35:53' ,'nick', 'germany'],
['2019-11-18 11:32:53','2019-11-18 11:35:53', 'joe', 'usa'],
['2019-11-18 12:33:58', '2019-11-18 12:35:58','smith', 'california'],
['2019-11-18 13:42:04','2019-11-18 13:47:04', 'sam', 'france']
]
df1 = pd.DataFrame(columns=list_columns, data=list_data)
df1.head()
但是在这里我要处理时间戳,我需要考虑hh:mm。您能给我一些提示,告诉我如何实现目标吗?
答案 0 :(得分:1)
您的start_time
和end_time
间隔似乎重叠。您是否期望多个比赛?您可以像这样进行交叉合并:
(df1.assign(tmp=1)
.merge(df2.assign(tmp=1), on='tmp', how='left')
.query('start_time <= time <= end_time')
.drop(['start_time', 'end_time'], axis=1)
)
输出:
time code age tmp name country
0 2019-11-18 10:33:53 a1 10 1 nick germany
5 2019-11-18 11:33:56 a2 15 1 joe usa
10 2019-11-18 12:33:58 a4 6 1 smith california
15 2019-11-18 13:45:04 a5 3 1 sam france