Question

我有这种形式的DataFrame：

In [122]: df=pd.DataFrame({"A":["1,2,3","4,5,6",np.nan,"8"],"B":[6,7,8,9]})

In [123]: df
Out[123]:
       A  B
0  1,2,3  6
1  4,5,6  7
2    NaN  8
3      8  9

我要过滤B中包含特定值（例如“ 4”）的行。

我尝试使用以下语法：

df["B"][["4" in a for a in df["A"].str.split(',')]]

但是我得到TypeError: argument of type 'float' is not iterable是因为其中一行中的NaN。所以我尝试了这种语法-

df["B"][["4" in a for a in df["A"].str.split(',') if pd.notnull(a)]]

但是我得到ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()。

有什么主意我该如何运作？我尝试了很少的想法，但是没有一个起作用，而且我真的不知道为什么这种语法是错误的。

预期的输出-7。

Answer 1

使用大熊猫替代品：

s = df.loc[df["A"].str.split(',', expand=True).eq('4').any(axis=1), 'B']
print (s)
1    7
Name: B, dtype: int64

说明：

通过Series.str.split中的参数expand=True创建DataFrame：

print (df["A"].str.split(',', expand=True))
     0     1     2
0    1     2     3
1    4     5     6
2  NaN   NaN   NaN
3    8  None  None

DataFrame.eq（==）的Comapre：

print (df["A"].str.split(',', expand=True).eq('4'))
       0      1      2
0  False  False  False
1   True  False  False
2  False  False  False
3  False  False  False

通过DataFrame.any检查每行至少一个True：

print (df["A"].str.split(',', expand=True).eq('4').any(axis=1))
0    False
1     True
2    False
3    False
dtype: bool

最后使用DataFrame.loc和boolean indexing进行过滤。

您的解决方案应使用if-else和isinstance进行更改：

mask = ["4" in a if isinstance(a, list) else False for a in df["A"].str.split(',')]

s = df.loc[mask, 'B']

Answer 2

您可以使用Series.str.contains

df=pd.DataFrame({"A":["14,2,3","4,5,6",np.nan,"8"],"B":[6,7,8,9]})
df[df['A'].str.contains(r'\b4\b', na=False)]

会给你：

    A       B
1   4,5,6   7

然后，您只能选择列B。

df[df['A'].str.contains(r'\b4\b', na=False)]['B']

# Output:
1    7
Name: B, dtype: int64

编辑：

应该使用.contains('4')而不是使用.contains(r'\b4\b')。为了避免选择14或包含4的任何其他数字

筛选非空行上的数据框

2 个答案: