我对以下问题感到困惑。我有两个数据帧df1和df2,并希望通过列传输选择比较这些,然后从df1国家和drom每个国家/地区定义的日期,如下面的代码中所示。如果这样做,我收到以下错误消息:
ValueError:只能比较带有相同标签的Series对象
代码如下所示:
from pandas.tseries.holiday import (
AbstractHolidayCalendar, EasterMonday,
GoodFriday, Holiday, next_monday,
Easter, nearest_workday, Day, USMartinLutherKingJr,
USPresidentsDay, USMemorialDay, USLaborDay,
USThanksgivingDay)
class GermanHoliday(AbstractHolidayCalendar):
rules = [
Holiday('New Years Day', month=1, day=1, observance=next_monday),
GoodFriday,
EasterMonday,
Holiday('Reformation Day', year=2017, month=10, day=31, observance=nearest_workday),
Holiday('Labour Day', month=5, day=1, observance=nearest_workday),
Holiday('Whit Monday', month=1, day=1, offset=[Easter(), Day(50)]),
Holiday('Day of German Unity', month=10, day=3, observance=nearest_workday),
Holiday('Christmas Day', month=12, day=25, observance=nearest_workday),
Holiday('Boxing Day',month=12, day=26, observance=nearest_workday)
]
class USHolidays(AbstractHolidayCalendar):
rules = [
Holiday('NewYearsDay', month=1, day=1, observance=nearest_workday),
USMartinLutherKingJr,
USPresidentsDay,
GoodFriday,
USMemorialDay,
Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday),
USLaborDay,
USThanksgivingDay,
Holiday('Christmas Day', month=12, day=25, observance=nearest_workday)
]
calendarGermany = GermanHoliday()
calendarUS = USHolidays()
holidaysGermany = calendarGermany .holidays().to_pydatetime()
holidaysUS = calendarUS .holidays().to_pydatetime()
qry = "Transportation in @df1.ticker and Date not in @holidaysGermany "
df2 = df2.query(qry)
数据帧df1和df2的结构如下:
DF1:
0 transportation country
1 ICE Germany
2 Lufthansa Germany
3 SIXT Germany
4 TGV France
5 Air France France
6 Alamo France
7 National USA
8 Amtrak USA
9 Delta USA
df2:
Date transportation price
0 2015-12-21 ICE 81.9924
1 2015-12-22 ICE 81.5173
2 2015-12-23 ICE 83.5015
3 2015-12-24 ICE 83.5015
4 2015-12-25 ICE 83.5015
5 2015-12-28 ICE 83.0357
6 2015-12-29 ICE 84.6286
7 2015-12-30 ICE 83.7250
8 2015-12-31 ICE 83.7250
9 2016-01-01 ICE 83.7250
10 2015-12-21 National 127.3900
11 2015-12-22 National 129.0000
12 2015-12-23 National 131.8800
13 2015-12-24 National 131.8800
14 2015-12-25 National 131.8800
15 2015-12-28 National 130.0300
16 2015-12-29 National 132.1700
...
最终结果应如下所示:
df2:
Date transportation price
0 2015-12-21 ICE 81.9924
1 2015-12-22 ICE 81.5173
2 2015-12-23 ICE 83.5015
3 2015-12-24 ICE 83.5015
4 2015-12-28 ICE 83.0357
5 2015-12-29 ICE 84.6286
6 2015-12-30 ICE 83.7250
7 2015-12-31 ICE 83.7250
8 2016-01-01 ICE 83.7250
9 2015-12-21 National 127.3900
10 2015-12-22 National 129.0000
11 2015-12-23 National 131.8800
12 2015-12-26 National 131.8800
13 2015-12-28 National 130.0300
14 2015-12-29 National 132.1700
...
答案 0 :(得分:1)
IIUC你可以这样做:
In [197]: qry = "transportation in @df1.transportation and \
...: Date not in ['2015-12-24','2015-12-25']"
In [198]: df2.query(qry)
Out[198]:
Date transportation price
0 2015-12-21 ICE 81.9924
1 2015-12-22 ICE 81.5173
2 2015-12-23 ICE 83.5015
5 2015-12-28 ICE 83.0357
6 2015-12-29 ICE 84.6286
7 2015-12-30 ICE 83.7250
8 2015-12-31 ICE 83.7250
9 2016-01-01 ICE 83.7250
10 2015-12-21 National 127.3900
11 2015-12-22 National 129.0000
12 2015-12-23 National 131.8800
15 2015-12-28 National 130.0300
16 2015-12-29 National 132.1700