我想从每个事件(级别1)中删除前导和尾随零,但不删除由非零数字包围的零。
以下内容适用于查找和删除全部零:
df = events[event_no][events[event_no] != 0]
我有以下等级系列:
1 2/09/2010 0
3/09/2010 1.5
4/09/2010 4.3
5/09/2010 5.1
6/09/2010 0
2 1/05/2007 53.2
2/05/2007 0
3/05/2007 21.5
4/05/2007 2.5
5/05/2007 0
并希望:
1 3/09/2010 1.5
4/09/2010 4.3
5/09/2010 5.1
2 1/05/2007 53.2
2/05/2007 0
3/05/2007 21.5
4/05/2007 2.5
我读过 Deleting DataFrame row in Pandas based on column value 和 Filter columns of only zeros from a Pandas data frame 但是没有成功解决这个问题。
答案 0 :(得分:0)
你的dataframe
看起来如何。无论如何,不应该有任何区别,简单的布尔索引应该这样做:
In [101]:print df
Out [101]:
c1
first second
1 2/09/2010 0.0
3/09/2010 1.5
4/09/2010 4.3
5/09/2010 5.1
6/09/2010 0.0
2 1/05/2007 53.2
2/05/2007 0.0
3/05/2007 21.5
4/05/2007 2.5
5/05/2007 0.0
In [102]:
is_edge=argwhere(hstack((0,diff([item[0] for item in df.index.tolist()])))!=0).flatten()
is_edge=hstack((is_edge, is_edge-1, 0, len(df)-1))
g_idx=hstack(([item for item in argwhere(df['c1']==0).flatten() if item not in is_edge],
argwhere(df['c1']!=0).flatten()))
print df.ix[sorted(g_idx)]
Out[102]:
c1
first second
1 3/09/2010 1.5
4/09/2010 4.3
5/09/2010 5.1
2 1/05/2007 53.2
2/05/2007 0.0
3/05/2007 21.5
4/05/2007 2.5
如果你有一个series
而不是dataframe
,说系列是s
,你可以:
将其转换为dataframe
:
df=pd.DataFrame(s, columns=['c1'])
或者:
In [113]:
is_edge=argwhere(hstack((0,diff([item[0] for item in s.index.tolist()])))!=0).flatten()
is_edge=hstack((is_edge, is_edge-1, 0, len(s)-1))
g_idx=hstack(([item for item in argwhere(s.values==0).flatten() if item not in is_edge],
argwhere(s.values!=0).flatten()))
s[sorted(g_idx)]
Out[113]:
first second
1 3/09/2010 1.5
4/09/2010 4.3
5/09/2010 5.1
2 1/05/2007 53.2
2/05/2007 0.0
3/05/2007 21.5
4/05/2007 2.5
dtype: float64
BTW,我通过以下方式生成系列:
In [116]:
tuples=[(1, '2/09/2010'),
(1, '3/09/2010'),
(1, '4/09/2010'),
(1, '5/09/2010'),
(1, '6/09/2010'),
(2, '1/05/2007'),
(2, '2/05/2007'),
(2, '3/05/2007'),
(2, '4/05/2007'),
(2, '5/05/2007')]
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(array([0.,1.5,4.3,5.1,0.,53.2,0.,21.5,2.5,0.]), index=index)
s
Out[116]:
first second
1 2/09/2010 0.0
3/09/2010 1.5
4/09/2010 4.3
5/09/2010 5.1
6/09/2010 0.0
2 1/05/2007 53.2
2/05/2007 0.0
3/05/2007 21.5
4/05/2007 2.5
5/05/2007 0.0
dtype: float64
我有相同的结构吗?