Question

我的数据集df如下所示：

Date         Value
...
2012-07-31   61.9443
2012-07-30   62.1551
2012-07-27   62.3328
...          ... 
2011-10-04   48.3923
2011-10-03   48.5939
2011-09-30   50.0327
2011-09-29   51.8350
2011-09-28   50.5555
2011-09-27   51.8470
2011-09-26   49.6350
...          ...
2011-08-03   61.3948
2011-08-02   61.5476
2011-08-01   64.1407
2011-07-29   65.0364
2011-07-28   65.7065
2011-07-27   66.3463
2011-07-26   67.1508
2011-07-25   67.5577
...          ...
2010-10-05   57.3674
2010-10-04   56.3687
2010-10-01   57.6022
2010-09-30   58.0993
2010-09-29   57.9934

下面是两列的数据类型：

Type                 Column Name              Example Value
-----------------------------------------------------------------
datetime64[ns]       Date                     2020-06-19 00:00:00
float64              Value                    108.82

我希望有一个df的子集，其中仅包含 10月份的第一个条目 和 最后一个的行选择了7月的条目 ：

Date         Value
...
2012-07-31   61.9443
2011-10-03   48.5939
2011-07-29   65.0364
2010-10-01   57.6022

有什么想法吗？

Answer 1

您可以按日期排序，以便您知道它们按时间顺序排列。在创建数据帧之后，一个月为7并记录组的最后一条记录，一个月为10的记录记录组的第一条记录。

然后您可以将它们串联起来。

df['Date'] = pd.to_datetime(df['Date'])
df = df.sort_values(by='Date')

j = df[df['Date'].dt.month == 7].groupby([df.Date.dt.year, df.Date.dt.month]).last()
o = df[df['Date'].dt.month == 10].groupby([df.Date.dt.year, df.Date.dt.month]).first()

pd.concat([j,o]).reset_index(drop=True)

输出

    Date        Value
0   2011-07-29  65.0364
1   2012-07-31  61.9443
2   2010-10-01  57.6022
3   2011-10-03  48.5939

Answer 2

这是一个仅基于熊猫的解决方案：

df = df.sort_values("Date")
october = df.groupby([df["Date"].dt.year, df["Date"].dt.month], as_index = False).first()
october = october[october.Date.dt.month == 10]

july = df.groupby([df["Date"].dt.year, df["Date"].dt.month], as_index = False).last()
july = july[july.Date.dt.month == 7]

pd.concat([july, october])

结果是：

        Date    Value
2 2011-07-29  65.0364
6 2012-07-31  61.9443
1 2010-10-01  57.6022
5 2011-10-03  48.5939

Python Pandas-获取特定月份的第一天和最后一天的行

2 个答案: