我多次遇到这个问题,每次都做不同的事情。其他人做了什么?
考虑系列s
s = pd.Series([1, 0, 2], list('abc'), name='s')
制作
的最快方法是什么?a 1
c 2
Name: s, dtype: int64
答案 0 :(得分:3)
布尔切片可能是最简单的方法:
In [1]: s = pd.Series([1, 0, 2], list('abc'), name='s')
In [2]: s[s != 0]
Out[2]:
a 1
c 2
Name: s, dtype: int64
答案 1 :(得分:1)
以下是我做过的一些事情
方法1
numpy
z = np.nonzero(s.values)
pd.Series(s.values[z], s.index.values[z], name=s.name)
方法2
to_frame
+ query
s.to_frame().query('s != 0').squeeze()
方法3
replace
+ dropna
s.replace(0, np.nan).dropna().astype(s.dtype)
所有收益
a 1
c 2
Name: s, dtype: int64
答案 2 :(得分:1)
显然,有很多方法可以获得相同的结果。我认为布尔索引是最简单的方法,但我也会测试不同方法的速度性能。在这里:
datetime2
SELECT
DATEADD(HOUR, DATEDIFF(HOUR, CAST('2000-01-01' AS datetime2), '1400-02-05 12:45'), CAST('2000-01-01' AS datetime2)) AS Hour,
DATEADD(DAY, DATEDIFF(DAY, CAST('2000-01-01' AS datetime2), '1400-02-05 12:45'), CAST('2000-01-01' AS datetime2)) AS Day,
DATEADD(MONTH, DATEDIFF(MONTH, CAST('2000-01-01' AS datetime2), '1400-02-05 12:45'), CAST('2000-01-01' AS datetime2)) AS Month,
DATEADD(YEAR, DATEDIFF(YEAR, CAST('2000-01-01' AS datetime2), '1400-02-05 12:45'), CAST('2000-01-01' AS datetime2)) AS Year;
s = pd.Series([1, 0, 2], list('abc'), name='s')
%%timeit
z = np.nonzero(s.values)
pd.Series(s.values[z], s.index.values[z], name=s.name)
## -- End pasted text --
The slowest run took 5.23 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 83.9 µs per loop
%%timeit
s.to_frame().query('s != 0').squeeze()
## -- End pasted text --
1000 loops, best of 3: 1.86 ms per loop
令我惊讶的是,方法1似乎是最快的,而方法4则紧随其后。也许numpy操作比熊猫快得多,这可能就是原因。