假设我有以下数据
import numpy as np
import pandas as pd
import datetime
table = [[datetime.datetime(2015, 1, 1), 1],
[datetime.datetime(2015, 1, 27), 1],
[datetime.datetime(2015, 1, 31), 1],
[datetime.datetime(2015, 2, 1), 1],
[datetime.datetime(2015, 2, 3), 1],
[datetime.datetime(2015, 2, 15), 1],
[datetime.datetime(2015, 2, 28), 1],
[datetime.datetime(2015, 3, 1), 1],
[datetime.datetime(2015, 3, 17), 1],
[datetime.datetime(2015, 3, 28), 1],
[datetime.datetime(2015, 4, 12), 1],
[datetime.datetime(2015, 4, 28), 1]]
df = pd.DataFrame(table, columns=['Date', 'Id'])
table2 = [datetime.datetime(2015, 3, 31),
datetime.datetime(2015, 6, 30),
datetime.datetime(2015, 9, 30)]
有没有办法将table2
合并到table
,以便table2
的元素在table
的最接近但最小或相等的元素上连接,然后向后填表?这也需要在列Id
上分组完成。例如,结果表将是
Date Id New
0 2015-01-01 1 2015-03-31
1 2015-01-27 1 2015-03-31
2 2015-01-31 1 2015-03-31
3 2015-02-01 1 2015-03-31
4 2015-02-03 1 2015-03-31
5 2015-02-15 1 2015-03-31
6 2015-02-28 1 2015-03-31
7 2015-03-01 1 2015-03-31
8 2015-03-17 1 2015-03-31
9 2015-03-28 1 2015-03-31
10 2015-04-12 1 2015-06-30
11 2015-04-28 1 2015-06-30
谢谢,Tingis
答案 0 :(得分:4)
您可以使用searchsorted
:
table2 = pd.to_datetime(table2)
idx = table2.searchsorted(df['Date'].values)
这将查找df['Date']
中的日期应插入table2
的索引,同时保持排序顺序。请注意,这假设table2
按排序顺序开始。
table2 = pd.to_datetime(table2)
idx = table2.searchsorted(df['Date'].values)
df['New'] = table2[idx]
print(df)
产量
Date Id New
0 2015-01-01 1 2015-03-31
1 2015-01-27 1 2015-03-31
2 2015-01-31 1 2015-03-31
3 2015-02-01 1 2015-03-31
4 2015-02-03 1 2015-03-31
5 2015-02-15 1 2015-03-31
6 2015-02-28 1 2015-03-31
7 2015-03-01 1 2015-03-31
8 2015-03-17 1 2015-03-31
9 2015-03-28 1 2015-03-31
10 2015-04-12 1 2015-06-30
11 2015-04-28 1 2015-06-30