选择DataFrame中不在Series中的行

时间:2016-05-26 13:53:55

标签: python pandas dataframe series

所以我有一个名为trips的DataFrame,其中包含以下信息:

route_id     service_id  shape_id                      trip_id
0     BX12  GH_B6-Weekday  BX120805  GH_B6-Weekday-004000_BX12_1
1     BX12  GH_B6-Weekday  BX120809  GH_B6-Weekday-009000_BX12_1
2     BX12  GH_B6-Weekday  BX120792  GH_B6-Weekday-013000_BX12_1
3     BX12  GH_B6-Weekday  BX120809  GH_B6-Weekday-017000_BX12_1
4     BX12  GH_B6-Weekday  BX120792  GH_B6-Weekday-021000_BX12_1
...

我还有一个名为invalidTrips的系列,其中包含以下信息:

trip_id
11760139-BPPB6-BP_B6-Weekday-10         16
11760139-BPPB6-BP_B6-Weekday-10-SDon    16
11760140-BPPB6-BP_B6-Weekday-10         19
11760140-BPPB6-BP_B6-Weekday-10-SDon    19
11760141-BPPB6-BP_B6-Weekday-10         16
...

如何在trips中选择trip_id中没有与trip_id invalid_trips匹配# Grab the number of trips made outside min and max hour. tooEarly = stopTimes['arrival_time'] < base_mintime tooLate = stopTimes['departure_time'] > base_maxtime invalidTrips = stopTimes[(tooEarly | tooLate)].groupby('trip_id').size() # Filter out the invalid trips. print(invalidTrips.size) print(trips.size) in_validTrips = ~trips.trip_id.isin(invalidTrips) validTrips = trips[in_validTrips][['route_id', 'service_id', 'shape_id']] print(validTrips.size) 的所有行?

编辑:现在我有了这段代码:

invalidTrips.size

无论出于何种原因,即使base_mintime可以根据base_maxtimevalidTrips.size进行更改,invalidTrips.size也会保持不变,即使我希望它依赖{ {1}}。为什么会这样呢?

(有关更多背景信息,这些都是从GTFS数据中提取的。)

1 个答案:

答案 0 :(得分:2)

<强>更新

尝试isin()函数和~运算符

根据@ EdChum在评论中的更正 - 如果invalid_trips属于系列类型:

trips[~trips.trip_id.isin(invalidTrips.index)]

<强> TEST:

In [39]: invalidTrips
Out[39]:
trip_id
11760139-BPPB6-BP_B6-Weekday-10         16
11760139-BPPB6-BP_B6-Weekday-10-SDon    16
11760140-BPPB6-BP_B6-Weekday-10         19
11760140-BPPB6-BP_B6-Weekday-10-SDon    19
11760141-BPPB6-BP_B6-Weekday-10         16
GH_B6-Weekday-017000_BX12_1             11         # <-- i've added it intentionally
Name: val, dtype: int64

In [40]: trips
Out[40]:
  route_id     service_id  shape_id                      trip_id
0     BX12  GH_B6-Weekday  BX120805  GH_B6-Weekday-004000_BX12_1
1     BX12  GH_B6-Weekday  BX120809  GH_B6-Weekday-009000_BX12_1
2     BX12  GH_B6-Weekday  BX120792  GH_B6-Weekday-013000_BX12_1
3     BX12  GH_B6-Weekday  BX120809  GH_B6-Weekday-017000_BX12_1  # <-- exclude this row 
4     BX12  GH_B6-Weekday  BX120792  GH_B6-Weekday-021000_BX12_1

In [41]: trips[~trips.trip_id.isin(invalidTrips.index)]
Out[41]:
  route_id     service_id  shape_id                      trip_id
0     BX12  GH_B6-Weekday  BX120805  GH_B6-Weekday-004000_BX12_1
1     BX12  GH_B6-Weekday  BX120809  GH_B6-Weekday-009000_BX12_1
2     BX12  GH_B6-Weekday  BX120792  GH_B6-Weekday-013000_BX12_1
4     BX12  GH_B6-Weekday  BX120792  GH_B6-Weekday-021000_BX12_1