比较两个不同的pandas数据帧并删除行Python

时间:2017-01-24 14:58:26

标签: python datetime dataframe

我对以下问题感到困惑。我有两个数据帧df1和df2,并希望通过列传输选择比较这些,然后从df1国家和drom每个国家/地区定义的日期,如下面的代码中所示。如果这样做,我收到以下错误消息:

  

ValueError:只能比较带有相同标签的Series对象

代码如下所示:

from pandas.tseries.holiday import (
AbstractHolidayCalendar, EasterMonday,
GoodFriday, Holiday, next_monday,
Easter, nearest_workday, Day, USMartinLutherKingJr,
USPresidentsDay, USMemorialDay, USLaborDay,
USThanksgivingDay)

class GermanHoliday(AbstractHolidayCalendar):
    rules = [
             Holiday('New Years Day', month=1, day=1, observance=next_monday),
             GoodFriday,
             EasterMonday,
             Holiday('Reformation Day', year=2017, month=10, day=31, observance=nearest_workday),
             Holiday('Labour Day', month=5, day=1, observance=nearest_workday),
             Holiday('Whit Monday', month=1, day=1, offset=[Easter(), Day(50)]),
             Holiday('Day of German Unity', month=10, day=3, observance=nearest_workday),
             Holiday('Christmas Day', month=12, day=25, observance=nearest_workday),
             Holiday('Boxing Day',month=12, day=26, observance=nearest_workday) 
    ]

class USHolidays(AbstractHolidayCalendar):
    rules = [
             Holiday('NewYearsDay', month=1, day=1, observance=nearest_workday),
             USMartinLutherKingJr,
             USPresidentsDay,
             GoodFriday,
             USMemorialDay,
             Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday),
             USLaborDay,
             USThanksgivingDay,
             Holiday('Christmas Day', month=12, day=25, observance=nearest_workday)
             ]

calendarGermany = GermanHoliday()
calendarUS = USHolidays()

holidaysGermany = calendarGermany .holidays().to_pydatetime()
holidaysUS = calendarUS .holidays().to_pydatetime()

qry = "Transportation in @df1.ticker and Date not in @holidaysGermany "

df2 = df2.query(qry)

数据帧df1和df2的结构如下:

DF1:

0    transportation  country
1    ICE             Germany
2    Lufthansa       Germany
3    SIXT            Germany
4    TGV             France
5    Air France      France
6    Alamo           France
7    National        USA
8    Amtrak          USA
9    Delta           USA

df2:

   Date         transportation price
0  2015-12-21   ICE            81.9924
1  2015-12-22   ICE            81.5173
2  2015-12-23   ICE            83.5015
3  2015-12-24   ICE            83.5015
4  2015-12-25   ICE            83.5015
5  2015-12-28   ICE            83.0357
6  2015-12-29   ICE            84.6286
7  2015-12-30   ICE            83.7250
8  2015-12-31   ICE            83.7250
9  2016-01-01   ICE            83.7250
10 2015-12-21   National       127.3900
11 2015-12-22   National       129.0000
12 2015-12-23   National       131.8800
13 2015-12-24   National       131.8800
14 2015-12-25   National       131.8800
15 2015-12-28   National       130.0300
16 2015-12-29   National       132.1700
...

最终结果应如下所示:

df2:

   Date         transportation price
0  2015-12-21   ICE            81.9924
1  2015-12-22   ICE            81.5173
2  2015-12-23   ICE            83.5015
3  2015-12-24   ICE            83.5015
4  2015-12-28   ICE            83.0357
5  2015-12-29   ICE            84.6286
6  2015-12-30   ICE            83.7250
7  2015-12-31   ICE            83.7250
8  2016-01-01   ICE            83.7250
9  2015-12-21   National       127.3900
10 2015-12-22   National       129.0000
11 2015-12-23   National       131.8800
12 2015-12-26   National       131.8800
13 2015-12-28   National       130.0300
14 2015-12-29   National       132.1700
...

1 个答案:

答案 0 :(得分:1)

IIUC你可以这样做:

In [197]: qry = "transportation in @df1.transportation and \
     ...:        Date not in ['2015-12-24','2015-12-25']"

In [198]: df2.query(qry)
Out[198]:
         Date transportation     price
0  2015-12-21            ICE   81.9924
1  2015-12-22            ICE   81.5173
2  2015-12-23            ICE   83.5015
5  2015-12-28            ICE   83.0357
6  2015-12-29            ICE   84.6286
7  2015-12-30            ICE   83.7250
8  2015-12-31            ICE   83.7250
9  2016-01-01            ICE   83.7250
10 2015-12-21       National  127.3900
11 2015-12-22       National  129.0000
12 2015-12-23       National  131.8800
15 2015-12-28       National  130.0300
16 2015-12-29       National  132.1700