筛选出数据框中的最高和最低值

时间:2019-10-22 10:07:08

标签: python python-3.x pandas dataframe

我有这个数据框:

import pandas as pd
df = pd.DataFrame({'Flight Day': ['2018-10-01', '2018-10-02','2018-10-03', '2018-10-04', '2018-10-05','2018-10-06', '2018-10-07', '2018-10-08', '2018-10-09','2018-10-10','2018-10-11','2018-10-12'], 
               'Flight Number': ['CA1336', 'CA1332', 'CA1336', 'CA1473', 'CA1336', 'CA1331', 'CA1666', 'CA1336', 'CA1336', 'CA1336', 'CA1336', 'CA1667'],
               'STD Departure': [10, 15, 10, 15,10, 15, 15, 15,10, 10, 10, 11], 
               'Sandwich 1': [2, 4, 8, 4,3, 2, 3, 1,5, 5, 2, 1],
               'Sandwich 2': [2, 4, 8, 4,2, 2, 3, 4,2, 5, 2, 1]})

我首先想保留每次飞行的最近5天以及特定的出发时间。 到目前为止,我一直使用此公式:

df = df.groupby(['Flight Number','STD Departure']).tail(5)

然后,我想删除消费最高(三明治+ Sandwic 2)和最低消费的航班,然后再次按“航班号”和“性病离港”将其分组。

我尝试了这段代码,但是没有带来预期的结果:

FF = ["Sandwich 1", "Sandwich 2"]
df ["sum"] = df[FF].sum(axis=1)
df = df.groupby(['Flight Number','STD Departure', 'sum']).head(4)
df = df.groupby(['Flight Number','STD Departure', 'sum']).tail(3)

有什么想法可以达到我期望的结果:

Flight Day Flight Number  STD Departure  Sandwich 1  Sandwich 2  sum
1   2018-10-02        CA1332             15           4           4    8
3   2018-10-04        CA1473             15           4           4    8
4   2018-10-05        CA1336             10           3           2    5
5   2018-10-06        CA1331             15           2           2    4
6   2018-10-07        CA1666             15           3           3    6
7   2018-10-08        CA1336             15           1           4    5
8   2018-10-09        CA1336             10           5           2    7
9   2018-10-10        CA1336             10           5           5   10
11  2018-10-12        CA1667             11           1           1    2

这些行是在最后一步中删除的:

10  2018-10-11        CA1336             10           2           2    4
2   2018-10-03        CA1336             10           8           8   16

2 个答案:

答案 0 :(得分:2)

我相信只有在组的长度更像2行时才需要删除顶部和底部行-首先按3列排序,然后使用import React from 'react'; import { Route, Switch, useRouteMatch } from 'react-router-dom'; import Media from 'react-media'; import { mediaQueries } from 'model'; import './MasterDetail.scss'; export const masterDetailHOC = <X,Y>( MasterComponent: any, DetailComponent: any, masterProps?: X, detailProps?: Y) => { return function(props: any) { let { path } = useRouteMatch() as any; return ( <Media query={mediaQueries.md}> {matches => matches ? ( <Switch> <Route exact path={`${path}`}> <MasterComponent {...props} {...masterProps} data-test="Master" /> </Route> <Route path={`${path}/detail/:id`}> <DetailComponent {...props} {...detailProps} data-test="Detail" /> </Route> </Switch> ) : ( <section className="master-detail"> <section className="master-detail__master"> <Route path={`${path}`}> <MasterComponent {...props} {...masterProps} data-test="Master" /> </Route> </section> <section className="master-detail__detail"> <Switch> <Route exact path={`${path}`}> <DetailComponent {...detailProps} data-test="Detail" /> </Route> <Route path={`${path}/detail/:id`}> <DetailComponent {...props} {...detailProps} data-test="Detail" /> </Route> </Switch> </section> </section> ) } </Media> ); } }; iloc语句按组删除:

if-else

答案 1 :(得分:0)

我自己弄清楚了,我忘了先排序。

df= df.sort_values (by = ["Flight Day", "Flight Number",'STD Departure'])
df = df.groupby(['Flight Number','STD Departure']).tail(5)
FF = ["Sandwich 1", "Sandwich 2"]
df ["sum"] = df[FF].sum(axis=1)
df= df.sort_values (by = ["Flight Number", "STD Departure", "sum"])
df = df.groupby(['Flight Number','STD Departure']).tail(4)
df = df.groupby(['Flight Number','STD Departure']).head(3)