这是DataFrame 1:
Date Serial Number Type
0 2014-12-17 1N4AL2EP8DC270200 New
1 2015-10-28 1N4AL2EP8DC270200 Used
2 2015-01-22 1N4AL3AP1EN239307 New
3 2015-11-22 1N4AL3AP1EN239307 Used
4 2015-05-22 1N4AL3AP1FC235402 New
5 2016-12-02 1N4AL3AP1FC235402 Used
6 2015-01-22 1N4AL3AP2FC213098 New
7 2016-05-13 1N4AL3AP2FC213098 Used
8 2014-05-14 1N4AL3AP3EC132416 New
9 2016-04-07 1N4AL3AP3EC132416 Used
10 2014-05-24 1N4AL3AP5EC316644 New
11 2014-12-18 1N4AL3AP5EC316644 Used
12 2014-12-11 1N4AL3AP6EC322517 New
13 2015-10-04 1N4AL3AP6EC322517 Used
14 2016-06-06 1N4AL3AP6EC322517 Used
...
这是DataFrame 2:
Date Serial Number
0 2014-03-12 5N1AA08C78N611573
1 2014-03-12 JN8AS5MT3EW604277
2 2014-03-12 1N6AF0LX5DN114710
3 2014-03-12 1N4AL3AP8DN447876
4 2014-03-12 JN8AZ1MU8AW021145
5 2014-03-12 JN1AZ4EH0AM500138
6 2014-03-12 JN8AF5MR3BT013548
7 2014-03-12 3N1AB61E17L629049
8 2014-03-12 3N1BC13E87L368844
9 2014-03-13 1N6AD07W95C431183
10 2014-03-13 1N6AA07A25N543180
11 2014-03-13 1N4CL2AP1BC110185
12 2014-03-13 JN8AZ1MW1BW181306
13 2014-03-13 5N1BV28U46N116791
...
刚刚给出了DataFrame的示例,而不是整个DataFrame。我需要检索其类型在DataFrame 1中使用的每个序列号的第一个日期(例如:对于序列号'1N4AL3AP6EC322517'2015-10-04是我正在寻找的日期。然后将此日期与如果DataFrame 2中的日期早于DataFrame 1中的日期,则在DataFrame 2中记录相同序列号的日期,标记为'A',否则用'B'标记。
必须为超过2000个序列号执行此操作,这是一种有效的方法吗?
答案 0 :(得分:0)
我认为您可以使用merge_asof
:
print (df2)
Date Serial Number
0 2016-03-12 1N4AL3AP6EC322517
1 2013-03-12 1N4AL3AP5EC316644
2 2014-03-12 1N4AL3AP3EC132416
3 2016-08-12 1N4AL3AP2FC213098
4 2014-03-12 JN8AZ1MU8AW021145
#if necessary cast Date columns to datetime
df1.Date = pd.to_datetime(df1.Date)
df2.Date = pd.to_datetime(df2.Date)
#get first value of column Serial Number filtered by Used
df = df1[df1.Type == 'Used'].drop_duplicates(['Serial Number'])
print (df)
Date Serial Number Type
1 2015-10-28 1N4AL2EP8DC270200 Used
3 2015-11-22 1N4AL3AP1EN239307 Used
5 2016-12-02 1N4AL3AP1FC235402 Used
7 2016-05-13 1N4AL3AP2FC213098 Used
9 2016-04-07 1N4AL3AP3EC132416 Used
11 2014-12-18 1N4AL3AP5EC316644 Used
13 2015-10-04 1N4AL3AP6EC322517 Used
#add value B
df2['Mark'] = 'B'
df = pd.merge_asof(df.sort_values(['Date']),
df2.sort_values(['Date']), on='Date', by='Serial Number')
print (df)
Date Serial Number Type Mark
0 2014-12-18 1N4AL3AP5EC316644 Used B
1 2015-10-04 1N4AL3AP6EC322517 Used NaN
2 2015-10-28 1N4AL2EP8DC270200 Used NaN
3 2015-11-22 1N4AL3AP1EN239307 Used NaN
4 2016-04-07 1N4AL3AP3EC132416 Used B
5 2016-05-13 1N4AL3AP2FC213098 Used NaN
6 2016-12-02 1N4AL3AP1FC235402 Used NaN
#add value A
mask = df['Serial Number'].isin(df2['Serial Number'])
df.loc[mask, 'Mark'] = df.loc[mask, 'Mark'].fillna('A')
print (df)
Date Serial Number Type Mark
0 2014-12-18 1N4AL3AP5EC316644 Used B
1 2015-10-04 1N4AL3AP6EC322517 Used A
2 2015-10-28 1N4AL2EP8DC270200 Used NaN
3 2015-11-22 1N4AL3AP1EN239307 Used NaN
4 2016-04-07 1N4AL3AP3EC132416 Used B
5 2016-05-13 1N4AL3AP2FC213098 Used A
6 2016-12-02 1N4AL3AP1FC235402 Used NaN