Question

我的Datset看起来像：

data="""cruiseid  year  station  month  day  date        lat        lon         depth_w  taxon                        count  
        AA8704    1987  1        04     13   13-APR-87   35.85      -75.48      18       Centropages_typicus          75343  
        AA8704    1987  1        04     13   13-APR-87   35.85      -75.48      18       Gastropoda                   0  
        AA8704    1987  1        04     13   13-APR-87   35.85      -75.48      18       Calanus_finmarchicus         2340   
        AA8704    1987  1        07     13   13-JUL-87   35.85      -75.48      18       Acartia_spp.                 5616   
        AA8704    1987  1        07     13   13-JUL-87   35.85      -75.48      18       Metridia_lucens              468    
        AA8704    1987  1        08     13   13-AUG-87   35.85      -75.48      18       Evadne_spp.                  0      
        AA8704    1987  1        08     13   13-AUG-87   35.85      -75.48      18       Salpa                        0      
        AA8704    1987  1        08     13   13-AUG-87   35.85      -75.48      18       Oithona_spp.                 468    
"""
datafile = open('data.txt','w')
datafile.write(data)
datafile.close()

我把它读成熊猫：

parse = lambda x: dt.datetime.strptime(x, '%d-%m-%Y')
df = pd.read_csv('data.txt',index_col=0, header=False, parse_dates={"Datetime" : [1,3,4]}, skipinitialspace=True, sep=' ', skiprows=0)

我怎样才能从这个数据框生成一个子集，其中包含4月份的所有记录，其中的分类是'Calanus_finmarchicus'或'Gastropoda'

我可以使用

查询taxon等于'Calanus_finmarchicus'或'Gastropoda'的数据帧

df[(df.taxon == 'Calanus_finmarchicus') | (df.taxon == 'Gastropoda')]

但是我在查询时遇到了麻烦，在numy中类似的东西可能就像：

import numpy as np
data = np.genfromtxt('data.txt', dtype=[('cruiseid','S6'), ('year','i4'), ('station','i4'), ('month','i4'), ('day','i4'), ('date','S9'), ('lat','f8'), ('lon','f8'), ('depth_w','i8'), ('taxon','S60'), ('count','i8')], skip_header=1)
selection = [np.where((data['taxon']=='Calanus_finmarchicus') | (data['taxon']=='Gastropoda') & ((data['month']==4) | (data['month']==3)))[0]]
data[selection]

这是带有笔记本的a link来重现示例

Answer 1

您可以参考datetime的{{1}}属性：

month

Answer 2

正如其他人所说，您可以使用df.index.month按月过滤，但我还建议您使用pandas.Series.isin()来检查taxon条件：

>>> df[df.taxon.isin(['Calanus_finmarchicus', 'Gastropoda']) & (df.index.month == 4)]
           cruiseid  station       date    lat    lon  depth_w  \
Datetime                                                         
1987-04-13   AA8704        1  13-APR-87  35.85 -75.48       18   
1987-04-13   AA8704        1  13-APR-87  35.85 -75.48       18   

                           taxon  count  Unnamed: 11  
Datetime                                              
1987-04-13            Gastropoda      0          NaN  
1987-04-13  Calanus_finmarchicus   2340          NaN

Answer 3

使用索引的month属性：

df[(df.index.month == 4) & ((df.taxon == 'Calanus_finmarchicus') | (df.taxon == 'Gastropoda'))]

Answer 4

我没注意语法（brachets顺序）和dataframe.index属性，这一行给了我喜欢的内容：

results = df[((df.taxon == 'Calanus_finmarchicus') | (df.taxon == 'Gastropoda')) & (df.index.month==4)]  # [df.index.month==4)]

查询基于索引和数据列的pandas数据帧

4 个答案: