具有日期范围列的内部联接熊猫数据框

时间:2020-01-13 13:16:42

标签: python pandas outer-join

我有2个带有日期范围的数据框

>>>>df1 = pd.DataFrame({'ID': ['a', 'a', 'a', 'a', 'b', 'b', 'b'],
          'from_dt':[pd.to_datetime('2004-01-01 00:00:00'),
                     pd.to_datetime('2005-01-01 00:00:00'),
                     pd.to_datetime('2007-01-01 00:00:00'),
                     pd.to_datetime('2011-01-01 00:00:00'),
                     pd.to_datetime('2004-01-01 00:00:00'),
                     pd.to_datetime('2012-01-01 00:00:00'),
                     pd.to_datetime('2019-01-01 00:00:00'),
                     ],
         'thru_dt':[pd.to_datetime('2004-12-31 23:59:59'),
                    pd.to_datetime('2006-12-31 23:59:59'),
                    pd.to_datetime('2010-12-31 23:59:59'),
                    pd.to_datetime('2075-12-31 23:59:59'),
                    pd.to_datetime('2011-12-31 23:59:59'),
                    pd.to_datetime('2018-12-31 23:59:59'),
                    pd.to_datetime('2075-12-31 23:59:59'),
                    ],
        'val':[1,2,3,4,5,6,7]})
>>>>df1
  ID    from_dt             thru_dt  val
0  a 2004-01-01 2004-12-31 23:59:59    1
1  a 2005-01-01 2006-12-31 23:59:59    2
2  a 2007-01-01 2010-12-31 23:59:59    3
3  a 2011-01-01 2075-12-31 23:59:59    4
4  b 2004-01-01 2011-12-31 23:59:59    5
5  b 2012-01-01 2018-12-31 23:59:59    6
6  b 2019-01-01 2075-12-31 23:59:59    7

第一个包含ID到Value的映射

>>>>df2 = pd.DataFrame({'ID':['a', 'a', 'b'], 'ID2':['A1', 'A2', 'B1'],
                'from_dt':[pd.to_datetime('2003-01-01 00:00:00'),
                        pd.to_datetime('2010-01-01 00:00:00'),
                        pd.to_datetime('2005-01-01 00:00:00'),
                        ],
                'thru_dt':[pd.to_datetime('2009-12-31 23:59:59'),
                           pd.to_datetime('2075-12-31 23:59:59'),
                           pd.to_datetime('2075-12-31 23:59:59'),
                           ]
                })
>>>>df2
  ID ID2    from_dt             thru_dt
0  a  A1 2003-01-01 2009-12-31 23:59:59
1  a  A2 2010-01-01 2075-12-31 23:59:59
2  b  B1 2005-01-01 2075-12-31 23:59:59

第二个数据帧的一个ID映射到另一个ID。

我想用一些内部联接逻辑将它们联接起来,以便有一个平面文件数据帧,其输出如下:

>>>>df3 = some_function(df1, df2, end_date=pd.to_datetime('2020-12-31'))

>>>>df3
  ID2       Date  val
0  A1 2004-01-01    1
1  A1 2004-01-02    1
.
.
x  A1 2004-12-31    1
x  A1 2005-01-01    2
x  A1 2005-01-02    2
.
.
x  A1 2009-12-31    3
x  A2 2010-01-01    3
.
.
x  A2 2020-12-31    4
x  B1 2005-01-01    5
.
.
x  B1 2020-12-31    7

我确定我可以循环并以某种低效的方式执行此操作。我很想学习可以处理此类任务的所有现有工具和库。

谢谢!

1 个答案:

答案 0 :(得分:2)

我认为您可以将列表推导用于平坦的日期范围,然后按结束日期时间进行过滤:

using PyCall
py"""
from google.colab import drive
drive.mount('/content/gdrive')
"""
cd("/content/gdrive/My\ Drive")
------------------------------------------------------------------------------------
Warning: Password input may be echoed.
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=...

Enter your authorization code:

PyError ($(Expr(:escape, :(ccall(#= /root/.julia/packages/PyCall/ttONZ/src/pyeval.jl:39 =# @pysym(:PyEval_EvalCode), PyPtr, (PyPtr, PyPtr, PyPtr), o, globals, locals))))) <class 'EOFError'>
EOFError()
  File "/root/.julia/packages/PyCall/ttONZ/src/pyeval.jl", line 2, in <module>
    const Py_file_input = 257
  File "/usr/local/lib/python3.6/dist-packages/google/colab/drive.py", line 234, in mount
    fifo_file.write(_getpass.getpass(auth_prompt) + '\n')
  File "/usr/lib/python3.6/getpass.py", line 91, in unix_getpass
    passwd = fallback_getpass(prompt, stream)
  File "/usr/lib/python3.6/getpass.py", line 126, in fallback_getpass
    return _raw_input(prompt, stream)
  File "/usr/lib/python3.6/getpass.py", line 148, in _raw_input
    raise EOFError


Stacktrace:

 [1] pyerr_check at /root/.julia/packages/PyCall/ttONZ/src/exception.jl:60 [inlined]

 [2] pyerr_check at /root/.julia/packages/PyCall/ttONZ/src/exception.jl:64 [inlined]

 [3] macro expansion at /root/.julia/packages/PyCall/ttONZ/src/exception.jl:84 [inlined]

 [4] pyeval_(::String, ::PyDict{String,PyObject,true}, ::PyDict{String,PyObject,true}, ::Int64, ::String) at /root/.julia/packages/PyCall/ttONZ/src/pyeval.jl:39

 [5] top-level scope at /root/.julia/packages/PyCall/ttONZ/src/pyeval.jl:232

 [6] top-level scope at In[10]:2