我的Datset看起来像:
data="""cruiseid year station month day date lat lon depth_w taxon count
AA8704 1987 1 04 13 13-APR-87 35.85 -75.48 18 Centropages_typicus 75343
AA8704 1987 1 04 13 13-APR-87 35.85 -75.48 18 Gastropoda 0
AA8704 1987 1 04 13 13-APR-87 35.85 -75.48 18 Calanus_finmarchicus 2340
AA8704 1987 1 07 13 13-JUL-87 35.85 -75.48 18 Acartia_spp. 5616
AA8704 1987 1 07 13 13-JUL-87 35.85 -75.48 18 Metridia_lucens 468
AA8704 1987 1 08 13 13-AUG-87 35.85 -75.48 18 Evadne_spp. 0
AA8704 1987 1 08 13 13-AUG-87 35.85 -75.48 18 Salpa 0
AA8704 1987 1 08 13 13-AUG-87 35.85 -75.48 18 Oithona_spp. 468
"""
datafile = open('data.txt','w')
datafile.write(data)
datafile.close()
我把它读成熊猫:
parse = lambda x: dt.datetime.strptime(x, '%d-%m-%Y')
df = pd.read_csv('data.txt',index_col=0, header=False, parse_dates={"Datetime" : [1,3,4]}, skipinitialspace=True, sep=' ', skiprows=0)
我怎样才能从这个数据框生成一个子集,其中包含4月份的所有记录,其中的分类是'Calanus_finmarchicus'或'Gastropoda'
我可以使用
查询taxon等于'Calanus_finmarchicus'或'Gastropoda'的数据帧df[(df.taxon == 'Calanus_finmarchicus') | (df.taxon == 'Gastropoda')]
但是我在查询时遇到了麻烦,在numy中类似的东西可能就像:
import numpy as np
data = np.genfromtxt('data.txt', dtype=[('cruiseid','S6'), ('year','i4'), ('station','i4'), ('month','i4'), ('day','i4'), ('date','S9'), ('lat','f8'), ('lon','f8'), ('depth_w','i8'), ('taxon','S60'), ('count','i8')], skip_header=1)
selection = [np.where((data['taxon']=='Calanus_finmarchicus') | (data['taxon']=='Gastropoda') & ((data['month']==4) | (data['month']==3)))[0]]
data[selection]
这是带有笔记本的a link来重现示例
答案 0 :(得分:5)
您可以参考datetime
的{{1}}属性:
month
答案 1 :(得分:2)
正如其他人所说,您可以使用df.index.month
按月过滤,但我还建议您使用pandas.Series.isin()
来检查taxon
条件:
>>> df[df.taxon.isin(['Calanus_finmarchicus', 'Gastropoda']) & (df.index.month == 4)]
cruiseid station date lat lon depth_w \
Datetime
1987-04-13 AA8704 1 13-APR-87 35.85 -75.48 18
1987-04-13 AA8704 1 13-APR-87 35.85 -75.48 18
taxon count Unnamed: 11
Datetime
1987-04-13 Gastropoda 0 NaN
1987-04-13 Calanus_finmarchicus 2340 NaN
答案 2 :(得分:1)
使用索引的month属性:
df[(df.index.month == 4) & ((df.taxon == 'Calanus_finmarchicus') | (df.taxon == 'Gastropoda'))]
答案 3 :(得分:0)
我没注意语法(brachets顺序)和dataframe.index属性,这一行给了我喜欢的内容:
results = df[((df.taxon == 'Calanus_finmarchicus') | (df.taxon == 'Gastropoda')) & (df.index.month==4)] # [df.index.month==4)]