在熊猫数据框中选择特定月份的行

时间:2021-07-20 19:06:17

标签: python pandas dataframe datetime

我有一个日期列从 2015 年到 2021 年的 Pandas 数据框。

print(data)
                   date   time  wind_speed  wind_direction
0      2015-01-01 00:00  00:00        28.0            25.0
1      2015-01-01 01:00  01:00        23.0            24.0
2      2015-01-01 02:00  02:00        25.0            24.0
3      2015-01-01 03:00  03:00        21.0            24.0
4      2015-01-01 04:00  04:00        23.0            24.0
...                 ...    ...         ...             ...
61363  2021-12-31 19:00  19:00         NaN             NaN
61364  2021-12-31 20:00  20:00         NaN             NaN
61365  2021-12-31 21:00  21:00         NaN             NaN
61366  2021-12-31 22:00  22:00         NaN             NaN
61367  2021-12-31 23:00  23:00         NaN             NaN

如何选择 date 列中月份 == 5、6、7、8、9 的行? (五月 -> 九月)

这是我试过的:

data['date'] = pd.to_datetime(data['date'])
data = data[data['date'].dt.month == 5, 6, 7, 8, 9]
print(data)
C:\Users\Chance\anaconda3\lib\site-packages\pandas\core\frame.py:3607: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-02646c054a5d> in <module>
      1 data['date'] = pd.to_datetime(data['date'])
----> 2 data = data[data['date'].dt.month == 5, 6, 7, 8, 9]
      3 data

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   3453             if self.columns.nlevels > 1:
   3454                 return self._getitem_multilevel(key)
-> 3455             indexer = self.columns.get_loc(key)
   3456             if is_integer(indexer):
   3457                 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3359             casted_key = self._maybe_cast_indexer(key)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
   3363                 raise KeyError(key) from err

~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

TypeError: '(0        False
1        False
2        False
3        False
4        False
         ...  
61363    False
61364    False
61365    False
61366    False
61367    False
Name: date, Length: 61368, dtype: bool, 6, 7, 8, 9)' is an invalid key

2 个答案:

答案 0 :(得分:1)

试试:

#if column `date` isn't already converted:
#df["date"] = pd.to_datetime(df["date"])

print(df[(df.date.dt.month > 4) & (df.date.dt.month < 10)])

打印:

                     date   time  wind_speed  wind_direction
3     2015-05-01 03:00:00  03:00        21.0            24.0
4     2015-06-01 04:00:00  04:00        23.0            24.0
61363 2021-07-01 19:00:00  19:00         NaN             NaN
61364 2021-08-01 20:00:00  20:00         NaN             NaN
61365 2021-09-01 21:00:00  21:00         NaN             NaN

df 使用:

                   date   time  wind_speed  wind_direction
0      2015-01-01 00:00  00:00        28.0            25.0
1      2015-01-01 01:00  01:00        23.0            24.0
2      2015-02-01 02:00  02:00        25.0            24.0
3      2015-05-01 03:00  03:00        21.0            24.0
4      2015-06-01 04:00  04:00        23.0            24.0
61363  2021-07-01 19:00  19:00         NaN             NaN
61364  2021-08-01 20:00  20:00         NaN             NaN
61365  2021-09-01 21:00  21:00         NaN             NaN
61366  2021-10-01 22:00  22:00         NaN             NaN
61367  2021-11-01 23:00  23:00         NaN             NaN

答案 1 :(得分:1)

试试:

df["date"] = pd.to_datetime(df["date"], format="%Y-%m-%d %H:%M")
>>> df[df["date"].dt.month.isin([5,6,7,8,9])]
                 date   time  wind_speed  wind_direction
2 2015-05-01 02:00:00  02:00          25            24.0
3 2015-07-01 03:00:00  03:00          21            24.0
4 2015-09-01 04:00:00  04:00          23            24.0