Question

我正在尝试解析1400多封电子邮件列表，每个单词都是列表的一部分。我使用的是Python 3.4。我需要过滤掉以下字词：

输出应仅包含字母为4的单词字母或更长（无数字）
输出应删除常用字词（停止使用＆＃39;和＆＃39;，＆＃39;但＆＃39;，＆＃39;他们＆＃39;等）
输出应删除输出中的常用单词对云这个词没有意义（＆＃39; sakai＆＃39;，＆＃39; email＆＃39;，＆＃39; re：＆＃39;）。

因此，样本列表如下所示：

    words = ['re:', 'sakai:', 'which', 'code', 'base', 'to', 'use', 'in', 
    'production:', 'maintenance', 'branch', 'or', 'release', 'tags']

我的问题是如何从给定的3条规则中删除指定的列表项并将其从单词（）中删除？我试过这个：

import re

for word in words:
    pattern = re.match('*sample removing stop words*', word)
    try:
        if pattern:
            words = words.remove(word)
            continue
    except TypeError:
        continue

但是每当我得到“无”时，我都会收到此错误：

TypeError: 'NoneType' object is not iterable

因此单词（）列表不会改变。如何更改单词（）列表以摆脱上面指定的单词？

如何使用Regex

0 个答案: