Question

我有一个数据框

entity  response    date
p   a1  1-Feb-14
p   a2  2-Feb-14
p   a3  3-Feb-14
p   a4  4-Feb-14
p   a5  5-Feb-14
p   a6  6-Feb-14
p   a7  7-Feb-14
p   a8  8-Feb-14
p   a9  9-Feb-14
p   a10 10-Feb-14
p   a11 11-Feb-14
p   a12 12-Feb-14
p   a13 13-Feb-14
p   a14 14-Feb-14
p   a15 15-Feb-14

和另一个数据框：

entity  start_date  end_date
p   2-Feb-14    4-Feb-14
p   6-Feb-14    7-Feb-14
p   9-Feb-14    12-Feb-14
q   1-Feb-14    7-Feb-14

基于第二个数据帧，我必须在第一个数据帧中创建一个True False列对于P，如果日期位于开始日期和结束日期窗口之间，则应为true，否则为false。

这可能是最快的方法，也是最短的方法。我尝试遍历整个数据帧，但这要花时间，并使代码也很长

Answer 1

也许我想得太多，但是

AC_PREREQ([2.69])
AC_INIT([project], [0.1.0], [example@example.com])
AM_INIT_AUTOMAKE
AC_CONFIG_SRCDIR([src/main.cpp])
AC_CONFIG_HEADERS([config.h])

# Checks for programs.
AC_PROG_CXX

# Checks for libraries.
AX_BOOST_UNIT_TEST_FRAMEWORK

# Checks for header files.

# Checks for typedefs, structures, and compiler characteristics.

# Checks for library functions.

AC_CONFIG_FILES([Makefile
                 src/Makefile
                 test/Makefile])
AC_OUTPUT

您还可以先进行一些预处理，以加快处理速度

def f(s):
    f2 = lambda d, n: ((d >= df2[df2.entity == n].start_date) & (d <= df2[df2.entity==n].end_date)).any()
    return(s.transform(f2, n=s.name))

df.groupby('entity').date.transform(f)

0     False
1      True
2      True
3      True
4     False
5      True
6      True
7     False
8      True
9      True
10     True
11     True
12    False
13    False
14    False
15    False
Name: date, dtype

请注意，默认情况下，此方法使用df2['j'] = df2.agg(lambda k: pd.Interval(k.start_date, k.end_date), 1) dic = df2.groupby('entity').agg(lambda k: list(k)).to_dict()['j'] df[['entity', 'date']].transform(lambda x: any(x['date'] in z for z in dic[x['entity']]), 1)仅在右侧关闭，但应比链式转换快20倍。

Answer 2

恕我直言，根据您的数据，有时可以先扩大日期范围

df2 = pd.concat([
    pd.DataFrame(pd.date_range(start_date, end_date), columns=['date']).assign(entity=entity)
    for _, (entity, start_date, end_date) in df2.iterrows()
]).drop_duplicates()
df.merge(df2, on=['entity', 'date'], how='left', indicator=True)['_merge'] == 'both'

如何基于来自熊猫中其他数据框的多个条件在数据框中创建新的布尔列

2 个答案: