Question

我有一个名为“ A”的熊猫列，其值类似于-

现在，我想在此列中搜索模式0 1 0，并在“ B”列中标识与“ A”列中的1对应的行。

例如

现在，我希望它在“ B”列中返回3。除了应用嵌套之外，还有其他解决方案吗？

Answer 1

您可以使用numpy来提高性能-来自this的经过修改的解决方案：

pat = [0,1,0]
N = len(pat)
df = pd.DataFrame({'B':range(4, 14), 'A':[0,0,1,0,0,1,0,0,1,0]})
print (df)
    B  A
0   4  0
1   5  0
2   6  1
3   7  0
4   8  0
5   9  1
6  10  0
7  11  0
8  12  1
9  13  0

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    c = np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
    return c

arr = df['A'].values
b = np.all(rolling_window(arr, N) == pat, axis=1)

print (rolling_window(arr, N))

[[0 0 1]
 [0 1 0]
 [1 0 0]
 [0 0 1]
 [0 1 0]
 [1 0 0]
 [0 0 1]
 [0 1 0]]

c = np.mgrid[0:len(b)][b]
#create indices of matched pattern
print (c)
[1 4 7]

#strides by column B indexed by indices of matched pattern    
d = rolling_window(df['B'].values, N)[c]
print (d)
[[ 5  6  7]
 [ 8  9 10]
 [11 12 13]]

#select second 'column'
e = d[:, 1].tolist()
print (e)
[6, 9, 12]

Answer 2

以下代码从指定要匹配的模式开始。在您的情况下，这是0 1 0。您还可以指定该模式中要与从B列中拉出的索引相对应的坐标。您想要一个中间元素，它是基于0的索引方案中的1坐标。

从那里开始，我们进入列A，并用Series.shift()对其进行移位。默认情况下，这包括丢失坐标的NaN值。 NaN与0或1或其他任何感兴趣的值都不匹配，因此我们可以直接将移位后的列与我们应该匹配的值进行比较，并获得准确的值我们想要的True或False类型的值。

为了匹配您的整个模式，我们需要将这些值与逻辑AND相结合。为此，我们用s1 & s2成对减少每个移位的列。这会返回一个新列，该列与原始列在逻辑上是“与”的。

最后，我们使用此布尔结果，该布尔结果具有与原始DataFrame df一样多的行，并使用它从df['B']中进行选择。这将返回一个新序列，该序列仅包含df['B']中预期坐标处的值。

from functools import reduce

matching_values = (0, 1, 0)
matching_index = 1

df['B'][reduce(
    lambda s1, s2: s1 & s2,
    (df['A'].shift(i)==v for i, v in zip(
        xrange(-matching_index, len(matching_values)-matching_index),
        matching_values)))]

如果使用Python 2.x，则不需要先导入reduce()，但是在Python 3.x中，zip()不会建立中间列表，从而节省了CPU和RAM资源

根据您的操作，可以很容易地将其提取到公开相关参数的函数中。 A和B的魔术弦可能不是理想的选择，它们是适当的选择。 matching_values和matching_index是其他可能的候选对象。

Answer 3

from scipy.signal import convolve
pat = [0,1,0]
df = pd.DataFrame({'B':range(4, 14), 'A':[0,0,1,0,0,1,0,0,1,0]})
s2 = convolve(df['A'],[0,1,0],mode = 'valid')
s2 = pd.Series(s2)
df.B.iloc[s2[s2==1].index + 1].values

o / p：

array([ 6,  9, 12])

对于您给定的示例，为

o / p：

array([3])

Answer 4

更改原始数据以使其适合更多数据：

import pandas as pd
o = pd.DataFrame({'A': [0, 1, 0, 1, 0, 0], 'B': [12, 14, 6, 3, 6, 8]})
b = o["A"]
m = [i+1 for (i, _) in enumerate(b) if i+2<len(b) and str(b[i])+str(b[i+1]) + str(b[i+2]) == '010']
print(o.loc[m]['B'].tolist())

因此，对于下一次输入：

将输出：

[14, 3]

在列中搜索特定模式

4 个答案: