Question

我有一个特定id的开始日期和结束日期（df_with_start_end），我试图找出其他数据框（df_dates）中具有相同id的其他日期。结果应输入新列。

我的想法是使用唯一ID迭代数据框df_with_start_end，并且对于每个ID我尝试分析来自df_with_start_end的开始和结束日期内df_dates是否还有其他日期。

我的实现是这样的，但它并没有这样做。

for k in df_with_start_end['ID']:
    df_with_start_end[k]['FREE_PERIOD'] = df_with_start_end[k]['START_DATE'] <= df_dates[k]['DATE'] < df_with_start_end[k]['END_DATE']

我收到此错误：

Traceback (most recent call last):
  File "/opt/anaconda/lib/python3.6/site-packages/pandas/indexes/base.py", line 2134, in get_loc
    return self._engine.get_loc(key)
  File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
  File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
  File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 3685509

以下是dataFrames的示例：

df_with_start_end
ID  START_DATE  END_DATE    FREE_PERIOD
1   2015-02-13  2016-02-13  False
2   2014-08-27  2015-08-27  True

df_dates
ID  DATE
1   2014-04-23
1   2015-08-02
1   2015-09-15
2   2014-06-19
2   2017-01-07

我听说python中的循环很慢有没有办法在我的情况下避免它们？

Answer 1

您希望迭代行，但实际上是通过列进行迭代。

for k in df_with_start_end['ID']: 表示k是ID值。

但是df_with_start_end[k]访问其值为k的列。由于您的列仅为START_DATE END_DATE FREE_PERIOD，因此您收到的错误表明您所寻找的值不存在。

解决方法是首先访问列，然后通过切换通话顺序来ID：

df_with_start_end['FREE_PERIOD'][k]

但更好的方法是使用loc函数：

df_with_start_end.loc[k, 'FREE_PERIOD']

Answer 2

对我来说，最简单的方法是加入两个dataFrame。对于这个连接，我使用了merge（）。那么比较它们要好得多。问题是我试图避免加入它们，但它看起来有时候是更好的方式。

python pandas - 分析其他日期的两个日期之间的时间段

2 个答案: