Question

我有以下2个df：

import itertools
import operator

def combinations_with_replacement(iterable, r):
    # combinations_with_replacement('ABC', 2) --> AA AB AC BB BC CC
    pool = tuple(iterable)
    n = len(pool)
    if not n and r:
        return
    indices = [0] * r
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != n - 1:
                break
        else:
            return
        indices[i:] = [indices[i] + 1] * (r - i)
        yield tuple(pool[i] for i in indices)

int_part = lambda n, k: (tuple(map(c.count, range(k))) for c in combinations_with_replacement(range(k), n))
for item in int_part(3,4): print(item)

和

# -- create a df1 
list_columns = ['time', 'code', 'age']
list_data = [
    ['2019-11-18 10:33:53', 'a1', 10],
    ['2019-11-18 11:33:56', 'a2', 15],
    ['2019-11-18 12:33:58', 'a4', 6],
    ['2019-11-18 13:45:04', 'a5', 3]
    ]
df1 = pd.DataFrame(columns=list_columns, data=list_data)

不知道是否可以在熊猫中使用，但如果df1中的时间是否在df1中的 start_time 和 end_time 之间，我想在年龄之后添加来自df2的名称和国家列。有点像联接。

对于日期，我将使用这样的蒙版：

# -- create a df2
list_columns = ['start_time','end_time','name', 'country']
list_data = [
    ['2019-11-18 10:31:53','2019-11-18 10:35:53' ,'nick', 'germany'],
    ['2019-11-18 11:32:53','2019-11-18 11:35:53', 'joe', 'usa'],
    ['2019-11-18 12:33:58', '2019-11-18 12:35:58','smith', 'california'],
    ['2019-11-18 13:42:04','2019-11-18 13:47:04', 'sam', 'france']
    ]
df1 = pd.DataFrame(columns=list_columns, data=list_data)
df1.head()

但是在这里我要处理时间戳，我需要考虑hh：mm。您能给我一些提示，告诉我如何实现目标吗？

Answer 1

您的start_time和end_time间隔似乎重叠。您是否期望多个比赛？您可以像这样进行交叉合并：

(df1.assign(tmp=1)
    .merge(df2.assign(tmp=1), on='tmp', how='left')
    .query('start_time <= time <= end_time')
    .drop(['start_time', 'end_time'], axis=1)
)

输出：

                  time code  age  tmp   name     country
0  2019-11-18 10:33:53   a1   10    1   nick     germany
5  2019-11-18 11:33:56   a2   15    1    joe         usa
10 2019-11-18 12:33:58   a4    6    1  smith  california
15 2019-11-18 13:45:04   a5    3    1    sam      france

来自一个df的时间戳，介于来自另一个df的2个时间戳之间

1 个答案: