Pandas / NumPy:简洁地标记与掩码匹配的前N个值

时间:2016-12-22 05:35:34

标签: pandas numpy

我有一个像这样的排序系列:

[2, 4, 5, 6, 8, 9]

我想生成另一个相同长度的系列或ndarray,其中前两个奇数和前两个偶数按顺序标记:

[0, 1, 2, _, _, 3]

我不太关心的_值。它们可以为零。

现在我这样做:

src = pd.Series([2, 4, 5, 6, 8, 9])
odd = src % 2 != 0
where = np.hstack((np.where(odd)[0][:2], np.where(~odd)[0][:2]))
where.sort() # maintain ordering - thanks to @hpaulj
res = np.zeros(len(src), int)
res[where] = np.arange(len(where))

你能更简洁地做到吗?输入永远不会为空,但可能没有赔率或没有均衡(在这种情况下,结果的长度可以是1,2或3而不是4)。

1 个答案:

答案 0 :(得分:1)

很棒的问题!我还在探索和学习。

我基本上坚持你到目前为止所做的工作,只是适度调整效率。如果我想到其他很酷的话,我会更新。

<强> 结论
到目前为止,我已经在很多地方徘徊,并没有太多改进。

我的回答

odd = src.values % 2
even = 1 - odd
res = ((odd.cumsum() * odd) < 3) * ((even.cumsum() * even) < 3)
(res.cumsum() - 1) * res

替代方案1
很快

a = src.values
odd = (a % 2).astype(bool)
rng = np.arange(len(a))

# same reason these are 2, we have 4 below
where = np.append(rng[~odd][:2], rng[odd][:2])
res = np.zeros(len(a), int)

# nature of the problem necessitates that this is 4
res[where] = np.arange(4)

替代2
不是那么快,但很有创意

a = src.values
odd = a % 2
res = np.zeros(len(src), int)
b = np.arange(2)
c = b[:, None] == odd
res[(c.cumsum(1) * c <= 2).all(0)] = np.arange(4)

替代3
仍然很慢

odd = src.values % 2
a = (odd[:, None] == [0, 1])
b = ((a.cumsum(0) * a) <= 2).all(1)
(b.cumsum() - 1) * b

<强> 定时 代码

def pir3(src):
    odd = src.values % 2
    a = (odd[:, None] == [0, 1])
    b = ((a.cumsum(0) * a) <= 2).all(1)
    return (b.cumsum() - 1) * b

def pir0(src):
    odd = src.values % 2
    even = 1 - odd
    res = ((odd.cumsum() * odd) < 3) * ((even.cumsum() * even) < 3)
    return (res.cumsum() - 1) * res

def pir2(src):
    a = src.values
    odd = a % 2
    res = np.zeros(len(src), int)
    c = b[:, None] == odd
    res[(c.cumsum(1) * c <= 2).all(0)] = np.arange(4)
    return res

def pir1(src):
    a = src.values
    odd = (a % 2).astype(bool)
    rng = np.arange(len(a))
    where = np.append(rng[~odd][:2], rng[odd][:2])
    res = np.zeros(len(a), int)
    res[where] = np.arange(4)
    return res

def john0(src):
    odd = src % 2 == 0
    where = np.hstack((np.where(odd)[0][:2], np.where(~odd)[0][:2]))
    res = np.zeros(len(src), int)
    res[where] = np.arange(len(where))
    return res

def john1(src):
    odd = src.values % 2 == 0
    where = np.hstack((np.where(odd)[0][:2], np.where(~odd)[0][:2]))
    res = np.zeros(len(src), int)
    res[where] = np.arange(len(where))
    return res

src = pd.Series([2, 4, 5, 6, 8, 9])
enter image description here

src = pd.Series([2, 4, 5, 6, 8, 9] * 10000)
enter image description here