Question

我在数据框中的过滤器有问题，我有几列的值以（，）分隔。如果这些值之一大于3（对于第一列）并且对于第二列中的8（值未排序，并且某些行中存在NaN），则需要过滤器

df示例：

data = {'ID':  ["1", "2","3","4"],
        'Filter1': ['1', '1,3,5','2,1','7,5'],
        'Filter2': ['20,5','7,13','8','9,15,18']
        }
df = pd.DataFrame (data, columns = ['ID','Filter1','Filter2'])

    ID  Filter1   Filter2
0   1   1         20;5
1   2   1;3;5     7;13
2   3   2;1       8
3   4   7;5       9;15;18

我认为split（'，'）是有用的，但是我不知道如何应用它，我认为any（）也是有用的。

我知道使用df [“ filter1”]。str.split（“，”）获得了一个列表，但是我不怎么在同一行中进行过滤，而且没有更复杂的内容。

我的第二个想法是拆分列，并使用常规表达式过滤列的名称，但我却无法正常工作。

df['Filter1].str.split(',', expand=True)

我需要获得类似的东西

        ID  Filter1   Filter2
   1    2   1;3;5     7;13
   3    4   7;5       9;15;18

Answer 1

然后让我们将split与any一起使用

s1 = df.Filter1.str.split(',',expand=True).astype(float).gt(3).any(1)
s2 = df.Filter2.str.split(',',expand=True).astype(float).gt(8).any(1)
newdf = df[s1 & s2]
newdf
Out[36]: 
  ID Filter1  Filter2
1  2   1,3,5     7,13
3  4     7,5  9,15,18

过滤大熊猫中的行，带有定界符的列

1 个答案: