通过多个日期列用先前和向前特定行的平均值来填充缺失值

时间:2019-12-15 17:45:12

标签: python pandas

这是我的DataFrame:

基本上在那里:

  • Query-Date列和 foreach Query Date日期,还有30 Check-In天,还有 foreach {{ 1}}日期还有5天。

注意: formart的时间是天/月/年

注意:每行的酒店名称都是相同的,只有Check-InPrice列(和日期)不同

  • Nights列,基本上是在NightsCheck-out
  • 之间的

DataFrame的示例:

Check-In

现在,在某些行中,缺少日期,因此,例如,我们可以找到以下内容:

+------------+-----------+-----------+------------+-------+--------+
| Query-Date | Check-In  | Check-Out | Hotel Name | Price | Nights |
+------------+-----------+-----------+------------+-------+--------+
| 1/1/2000   | 1/1/2000  | 2/1/2000  | HotelName1 | 10    | 1      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 3/1/2000  | HotelName1 | 21    | 2      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 4/1/2000  | ...        | ..    | 3      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 5/1/2000  | ...        | ..    | 4      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 6/1/2000  | ...        | ..    | 5      |
+------------+-----------+-----------+------------+-------+--------+
|            | 2/1/2000  | 3/1/2000  |            |       | 1      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 4/1/2000  |            |       | 2      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 5/1/2000  |            |       | 3      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 6/1/2000  |            |       | 4      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 7/1/2000  |            |       | 5      |
+------------+-----------+-----------+------------+-------+--------+
|            | 3/1/2000  | 4/1/2000  |            |       | 1      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 5/1/2000  |            |       | 2      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 6/1/2000  |            |       | 3      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 7/1/2000  |            |       | 4      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 8/1/2000  |            |       | 5      |
+------------+-----------+-----------+------------+-------+--------+
|            | ...       |           |            |       |        |
+------------+-----------+-----------+------------+-------+--------+
|            | 30/1/2000 | 31/1/2000 |            |       | 1      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 1/2/2000  |            |       | 2      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 2/2/2000  |            |       | 3      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 3/2/2000  |            |       | 4      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 4/2/2000  |            |       | 5      |
+------------+-----------+-----------+------------+-------+--------+
| 2/1/2000   | 2/1/2000  | 2/1/2000  |            |       | 1      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 3/1/2000  |            |       | 2      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 4/1/2000  |            |       | 3      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 5/1/2000  |            |       | 4      |
+------------+-----------+-----------+------------+-------+--------+
|            |           | 6/1/2000  |            |       | 5      |
+------------+-----------+-----------+------------+-------+--------+
|            | 3/1/2000  | ...       |            |       |        |
+------------+-----------+-----------+------------+-------+--------+

我们可以注意到,对于值{3/1/2000“的+------------+-----------+-----------+------------+-------+--------+ | 3/1/2000 | 3/1/2000 | 4/1/2000 | | | 1 | +------------+-----------+-----------+------------+-------+--------+ | | | 6/1/2000 | | | 3 | +------------+-----------+-----------+------------+-------+--------+ | | | 7/1/2000 | | | 4 | +------------+-----------+-----------+------------+-------+--------+ | | 4/1/2000 | 5/1/2000 | | | 1 | +------------+-----------+-----------+------------+-------+--------+ | | | 6/1/2000 | | | 2 | +------------+-----------+-----------+------------+-------+--------+ | | | 7/1/2000 | | | 3 | +------------+-----------+-----------+------------+-------+--------+ | | | 8/1/2000 | | | 4 | +------------+-----------+-----------+------------+-------+--------+ | | | 9/1/2000 | | | 5 | +------------+-----------+-----------+------------+-------+--------+ Query-Date” 3/1/2000“,缺少两个日期:日期” 5/1 / 2000”(2晚)和“ 8/1/2000”(5晚)

这些天,我想要添加的是相同的酒店名称,并且最接近的前一行具有相同的Check-In值,最接近的前一行具有相同的{{1} }值

但是这件事更加复杂,因为丢失可能是整个Nights甚至更少。

所以基本上我发现了几个主题:

  • https://stackoverflow.com/a/19324591 基本上他们说日期应该是索引,然后我们可以使用NightsQuery Date,所以我要做的是确定以下3列:pd.date_range,{{1} },reindex为索引:

Query-Date

我还可以找到Query-Date的最小值和最大值,但是找不到找到3列范围的方法。

  • https://stackoverflow.com/a/44102947/11356272 在本主题中,有一个代码可以填补前一个和前一个之间的空白,但这并不是我真正的情况,因为我不需要LAST前一个和前一个行作为均值,而是LAST前一个和前一个行同一个晚上

我需要 伪代码 中的类似内容:

Check-In

希望你们能帮助我做到这一点。

0 个答案:

没有答案