我有2个带有日期范围的数据框
>>>>df1 = pd.DataFrame({'ID': ['a', 'a', 'a', 'a', 'b', 'b', 'b'],
'from_dt':[pd.to_datetime('2004-01-01 00:00:00'),
pd.to_datetime('2005-01-01 00:00:00'),
pd.to_datetime('2007-01-01 00:00:00'),
pd.to_datetime('2011-01-01 00:00:00'),
pd.to_datetime('2004-01-01 00:00:00'),
pd.to_datetime('2012-01-01 00:00:00'),
pd.to_datetime('2019-01-01 00:00:00'),
],
'thru_dt':[pd.to_datetime('2004-12-31 23:59:59'),
pd.to_datetime('2006-12-31 23:59:59'),
pd.to_datetime('2010-12-31 23:59:59'),
pd.to_datetime('2075-12-31 23:59:59'),
pd.to_datetime('2011-12-31 23:59:59'),
pd.to_datetime('2018-12-31 23:59:59'),
pd.to_datetime('2075-12-31 23:59:59'),
],
'val':[1,2,3,4,5,6,7]})
>>>>df1
ID from_dt thru_dt val
0 a 2004-01-01 2004-12-31 23:59:59 1
1 a 2005-01-01 2006-12-31 23:59:59 2
2 a 2007-01-01 2010-12-31 23:59:59 3
3 a 2011-01-01 2075-12-31 23:59:59 4
4 b 2004-01-01 2011-12-31 23:59:59 5
5 b 2012-01-01 2018-12-31 23:59:59 6
6 b 2019-01-01 2075-12-31 23:59:59 7
第一个包含ID到Value的映射
>>>>df2 = pd.DataFrame({'ID':['a', 'a', 'b'], 'ID2':['A1', 'A2', 'B1'],
'from_dt':[pd.to_datetime('2003-01-01 00:00:00'),
pd.to_datetime('2010-01-01 00:00:00'),
pd.to_datetime('2005-01-01 00:00:00'),
],
'thru_dt':[pd.to_datetime('2009-12-31 23:59:59'),
pd.to_datetime('2075-12-31 23:59:59'),
pd.to_datetime('2075-12-31 23:59:59'),
]
})
>>>>df2
ID ID2 from_dt thru_dt
0 a A1 2003-01-01 2009-12-31 23:59:59
1 a A2 2010-01-01 2075-12-31 23:59:59
2 b B1 2005-01-01 2075-12-31 23:59:59
第二个数据帧的一个ID映射到另一个ID。
我想用一些内部联接逻辑将它们联接起来,以便有一个平面文件数据帧,其输出如下:
>>>>df3 = some_function(df1, df2, end_date=pd.to_datetime('2020-12-31'))
>>>>df3
ID2 Date val
0 A1 2004-01-01 1
1 A1 2004-01-02 1
.
.
x A1 2004-12-31 1
x A1 2005-01-01 2
x A1 2005-01-02 2
.
.
x A1 2009-12-31 3
x A2 2010-01-01 3
.
.
x A2 2020-12-31 4
x B1 2005-01-01 5
.
.
x B1 2020-12-31 7
我确定我可以循环并以某种低效的方式执行此操作。我很想学习可以处理此类任务的所有现有工具和库。
谢谢!
答案 0 :(得分:2)
我认为您可以将列表推导用于平坦的日期范围,然后按结束日期时间进行过滤:
using PyCall
py"""
from google.colab import drive
drive.mount('/content/gdrive')
"""
cd("/content/gdrive/My\ Drive")
------------------------------------------------------------------------------------
Warning: Password input may be echoed.
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=...
Enter your authorization code:
PyError ($(Expr(:escape, :(ccall(#= /root/.julia/packages/PyCall/ttONZ/src/pyeval.jl:39 =# @pysym(:PyEval_EvalCode), PyPtr, (PyPtr, PyPtr, PyPtr), o, globals, locals))))) <class 'EOFError'>
EOFError()
File "/root/.julia/packages/PyCall/ttONZ/src/pyeval.jl", line 2, in <module>
const Py_file_input = 257
File "/usr/local/lib/python3.6/dist-packages/google/colab/drive.py", line 234, in mount
fifo_file.write(_getpass.getpass(auth_prompt) + '\n')
File "/usr/lib/python3.6/getpass.py", line 91, in unix_getpass
passwd = fallback_getpass(prompt, stream)
File "/usr/lib/python3.6/getpass.py", line 126, in fallback_getpass
return _raw_input(prompt, stream)
File "/usr/lib/python3.6/getpass.py", line 148, in _raw_input
raise EOFError
Stacktrace:
[1] pyerr_check at /root/.julia/packages/PyCall/ttONZ/src/exception.jl:60 [inlined]
[2] pyerr_check at /root/.julia/packages/PyCall/ttONZ/src/exception.jl:64 [inlined]
[3] macro expansion at /root/.julia/packages/PyCall/ttONZ/src/exception.jl:84 [inlined]
[4] pyeval_(::String, ::PyDict{String,PyObject,true}, ::PyDict{String,PyObject,true}, ::Int64, ::String) at /root/.julia/packages/PyCall/ttONZ/src/pyeval.jl:39
[5] top-level scope at /root/.julia/packages/PyCall/ttONZ/src/pyeval.jl:232
[6] top-level scope at In[10]:2