这是我的DataFrame:
基本上在那里:
Query-Date
列和 foreach Query Date
日期,还有30 Check-In
天,还有 foreach {{ 1}}日期还有5天。注意: formart的时间是天/月/年
注意:每行的酒店名称都是相同的,只有Check-In
和Price
列(和日期)不同
Nights
列,基本上是在Nights
和Check-out
DataFrame的示例:
Check-In
现在,在某些行中,缺少日期,因此,例如,我们可以找到以下内容:
+------------+-----------+-----------+------------+-------+--------+
| Query-Date | Check-In | Check-Out | Hotel Name | Price | Nights |
+------------+-----------+-----------+------------+-------+--------+
| 1/1/2000 | 1/1/2000 | 2/1/2000 | HotelName1 | 10 | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 3/1/2000 | HotelName1 | 21 | 2 |
+------------+-----------+-----------+------------+-------+--------+
| | | 4/1/2000 | ... | .. | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 5/1/2000 | ... | .. | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | | 6/1/2000 | ... | .. | 5 |
+------------+-----------+-----------+------------+-------+--------+
| | 2/1/2000 | 3/1/2000 | | | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 4/1/2000 | | | 2 |
+------------+-----------+-----------+------------+-------+--------+
| | | 5/1/2000 | | | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 6/1/2000 | | | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | | 7/1/2000 | | | 5 |
+------------+-----------+-----------+------------+-------+--------+
| | 3/1/2000 | 4/1/2000 | | | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 5/1/2000 | | | 2 |
+------------+-----------+-----------+------------+-------+--------+
| | | 6/1/2000 | | | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 7/1/2000 | | | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | | 8/1/2000 | | | 5 |
+------------+-----------+-----------+------------+-------+--------+
| | ... | | | | |
+------------+-----------+-----------+------------+-------+--------+
| | 30/1/2000 | 31/1/2000 | | | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 1/2/2000 | | | 2 |
+------------+-----------+-----------+------------+-------+--------+
| | | 2/2/2000 | | | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 3/2/2000 | | | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | | 4/2/2000 | | | 5 |
+------------+-----------+-----------+------------+-------+--------+
| 2/1/2000 | 2/1/2000 | 2/1/2000 | | | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 3/1/2000 | | | 2 |
+------------+-----------+-----------+------------+-------+--------+
| | | 4/1/2000 | | | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 5/1/2000 | | | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | | 6/1/2000 | | | 5 |
+------------+-----------+-----------+------------+-------+--------+
| | 3/1/2000 | ... | | | |
+------------+-----------+-----------+------------+-------+--------+
我们可以注意到,对于值{3/1/2000“的+------------+-----------+-----------+------------+-------+--------+
| 3/1/2000 | 3/1/2000 | 4/1/2000 | | | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 6/1/2000 | | | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 7/1/2000 | | | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | 4/1/2000 | 5/1/2000 | | | 1 |
+------------+-----------+-----------+------------+-------+--------+
| | | 6/1/2000 | | | 2 |
+------------+-----------+-----------+------------+-------+--------+
| | | 7/1/2000 | | | 3 |
+------------+-----------+-----------+------------+-------+--------+
| | | 8/1/2000 | | | 4 |
+------------+-----------+-----------+------------+-------+--------+
| | | 9/1/2000 | | | 5 |
+------------+-----------+-----------+------------+-------+--------+
和Query-Date
” 3/1/2000“,缺少两个日期:日期” 5/1 / 2000”(2晚)和“ 8/1/2000”(5晚)
这些天,我想要添加的是相同的酒店名称,并且最接近的前一行具有相同的Check-In
值,最接近的前一行具有相同的{{1} }值
但是这件事更加复杂,因为丢失可能是整个Nights
甚至更少。
所以基本上我发现了几个主题:
Nights
和Query Date
,所以我要做的是确定以下3列:pd.date_range
,{{1} },reindex
为索引: Query-Date
我还可以找到Query-Date的最小值和最大值,但是找不到找到3列范围的方法。
我需要 伪代码 中的类似内容:
Check-In
希望你们能帮助我做到这一点。