假设有一个数据框如下:
df = {
'Period': [1996,'Jan','Feb','March',1997,'Jan','Feb','March',1998,'Jan','Feb','March']
'Some-Values': [,'a','b','c',,'d','e','f',,'g',h','i']
}
需要提取值1996
和1997
之间的行,以便生成的数据框如下:
df_res = {
'Period': ['Jan','Feb','March']
'Some-Values': ['a','b','c']
}
我目前正在尝试Pandas,但无法找到解决方案。
答案 0 :(得分:2)
尝试将数据框更改为“正确”方式,然后我们可以使用年份信息获取信息
df['Year']=df.loc[df['Some-Values']=='','Period']
df.Year=df.Year.ffill()
df=df.loc[df.Period!=df.Year,:]
df.loc[df.Year==1996,:]
Out[651]:
Period Some-Values Year
1 Jan a 1996
2 Feb b 1996
3 March c 1996
答案 1 :(得分:1)
通过pd.Series.idxmax
和pd.DataFrame.iloc
的一种方式:
df = pd.DataFrame({'Period': [1996,'Jan','Feb','March',1997,'Jan','Feb',
'March',1998,'Jan','Feb','March'],
'Some-Values': ['','a','b','c','','d','e','f','','g','h','i']})
res = df.iloc[(df['Period'] == 1996).idxmax()+1:(df['Period'] == 1997).idxmax()]
print(res)
Period Some-Values
1 Jan a
2 Feb b
3 March c
为了便于阅读,您可以使用slice
对象:
slicer = slice((df['Period'] == 1996).idxmax()+1,
(df['Period'] == 1997).idxmax())
res = df.iloc[slicer]