Question

这里对Python很新，但仍然没有完全理解如何正确使用Python，所以请在这里坚持我的愚蠢。

假设我们有一个这样的数据框：

samp_data = pd.DataFrame([[1,'hello there',3],
                             [4,'im just saying hello',6],
                             [7,'but sometimes i say bye',9],
                             [2,'random words here',5]],
                            columns=["a", "b", "c"])
print(samp_data)
   a                        b  c
0  1              hello there  3
1  4     im just saying hello  6
2  7  but sometimes i say bye  9
3  2        random words here  5

我们设置了一个我们不想要的单词列表：

unwanted_words = ['hello', 'random']

我想编写一个函数，排除列b包含“unwanted_words”列表中任何字词的所有行。所以输出应该是：

print(samp_data)
   a                        b  c
2  7  but sometimes i say bye  9

到目前为止我尝试过的内容包括使用内置的“isin（）”函数：

data = samp_data.ix[samp_data['b'].isin(unwanted_words),:]

但这并不排除我预期的行; 我尝试使用str.contains（）函数：

for i,row in samp_data.iterrows():
    if unwanted_words.str.contains(row['b']).any():
        print('found matching words')

这会给我带来错误。

我认为我只是让事情复杂化，必须有一些我不了解的非常简单的方法。非常感谢任何帮助！

到目前为止我读到的帖子（不限于此列表，因为我已关闭了许多窗口）：

Answer 1

你真的非常接近解决方案。它使用Series.str.contains方法。请记住它允许正则表达式：

samp_data[~samp_data['b'].str.contains(r'hello|random')]

结果将是：

Out [11]:
    a                         b c
2   7   but sometimes i say bye 9

Answer 2

也许不是最优雅但我觉得它对你有用吗？

def in_excluded(my_str, excluded):
    """
    (str) -> bool
    """
    for each in my_str:
        if each in excluded:
            return True
    return False


def print_only_wanted(samp_data, excluded):
    """
    (list, list) -> None
    Prints each of the lists in the main list unless they contain a word 
    from excluded
    """
    for each in samp_data:
        if not in_excluded(each, excluded):
            print each

Answer 3

您可以使用./mongod --port 12345来确定是否可以在另一个字符串中找到一个字符串。例如，in将返回"he" in "hello"。您可以将其与列表推导和True函数结合使用，以选择所需的行：

any

Answer 4

您可以使用str.contains

String temp = s.replace(sub, "");
int occ = (s.length() - temp.length()) / sub.length();

你得到了

samp_data = samp_data[~samp_data.b.str.contains('hello|random')]

如果您的不需要的字词列表较长，您可能需要使用

    a   b                       c
2   7   but sometimes i say bye 9

Answer 5

这个单班轮怎么样？我相信其他一些pandas爱好者会得到比我更复杂的答案。

samp_data[~samp_data['b'].apply(lambda x: any(word in unwanted_words for word in x.split()))]

   a                        b  c
2  7  but sometimes i say bye  9

python如何匹配两个不等大小的列之间的部分字符串

5 个答案: