Question

我有一个DataFrame可以将特定文本存储在多列中。我试图过滤出具有该值的DataFrame中的所有行。

id,col1,col2,col3,col4
1001,apple,banana,pear,kiwi
1002,,apple,,
1003,banana,kiwi,,
1004,pear,orange,apple,

鉴于上述示例，我试图过滤所有带有单词apple的行以及该行的ID。

for col in df:
    apple = df[df[col].astype(str).str.contains("apple")]

但这返回空行。

预期输出：

id,value
1001,apple
1002,apple
1004,apple

Answer 1

将想法设置为id来索引，并将非apple的值替换为缺失的DataFrame.where，因此在DataFrame.stack获得带有MultiIndex的Series后，请添加两倍Series.reset_index-首先用于删除第一级，第二用于将Series转换为2 columns DataFrame：

df = (df.set_index('id')
        .where(lambda x: x == 'apple')
        .stack()
        .reset_index(level=1, drop=True)
        .reset_index(name='val')
)
print (df)
     id    val
0  1001  apple
1  1002  apple
2  1004  apple

对于测试子字符串，请使用Series.str.contains，并通过DataFrame.dropna删除丢失的行：

df = (df.set_index('id')
        .stack()
        .where(lambda x: x.str.contains('apple'))
        .dropna()
        .reset_index(level=1, drop=True)
        .reset_index(name='val')
)
print (df)
     id    val
0  1001  apple
1  1002  apple
2  1004  apple

Answer 2

这应该有效。

for col in df:
    apple = df[df[col].astype(str).contains("apple")]

Answer 3

如果要使用该字符串过滤掉所有行，为什么在预期输出中仅将其作为一行ID？要用该字符串过滤掉所有行，我可以这样做：

mask = None
for col in df.columns:
     mask_cur = df[col].astype(str).contains("apple")
     mask = mask_cur if mask is None else (mask & (mask_cur))
answer = df[mask]

熊猫-在整个数据框中搜索特定文本

3 个答案: