Question

这里我使用的文件包含一系列停用词。我想从文本中删除所有的停用词。

def print_stopWords(self):

    #infile = open("D:\Komal\MyPrograms\Pkg\PkgSubfolder\StopWords.txt", 'r')
    stopwords = ()
    print '\nstopwords are-'
    for line in open('D:\Komal\MyPrograms\Pkg\PkgSubfolder\StopWords.txt'):
        stopwords += (line,)

    print stopwords
    return stopwords


def strip_stopwords(self,text,stopword):
    print '\n Text after removing all stopwords is --'
    words = text.split()
    text = []
    for word in words:
        if word.lower() not in stopword:
            text.append(word)
    print u' '.join(text)        #'u' prefix allows you to write a unicode string literal
    return text

Answer 1

问题不清楚（你应该显示所有代码），但我认为你的主要问题是：

stopwords = ()

（）用于声明一个元组，它是不可变的，即一旦定义了它就不能改变它的内容。您可能正在寻找列表，字典或集合（在这种情况下最好），所有这些都可以添加元素（例如在for循环中）。您应该查看Python教程以了解这些基本数据结构。

Answer 2

你的问题不明确。我看到的唯一的功能问题是stopwords是一个元组，它是不可变的，所以你不能附加到它，不像列表。

无论如何，对于性能stopwords，应该是一个集合（/ dict），而不是列表/元组。查找集合是O（1）而不是O（N）。

def print_stopWords(self):
    stopwords = set()
    print '\nstopwords are-'
    for word in ...:
        stopwords.add(line)
    return stopwords

有点奇怪print_stopWords()是一种方法，但不会在任何地方修改对象（即从不使用self，例如分配给self.stopwords）

strip_stopwords()可以简单地使用列表理解：

u' '.join(w for w in text.split() if w.lower() not in stopwords)

Answer 3

将停用词导入python代码的整个过程可以使用一行代码完成。然而，重要的是要理解代码背后的逻辑。

为了选择正确的数据结构：存储项目的停用词列表，我们需要一个不可变的数据结构（集合/元组）并最小化内存使用（集合）。所以我们正在使用集合。

stopword = set（line.strip（）for line in open（＆＃39; Stopwords＆＃39;＆＃39; r＆＃39;））

Answer 4

在这里，您可以使用我的代码。

public paginate(page) {
  this.route.params.subscribe(
    (params: any) => {
      this.pn_location_e = params['pn_location.e'];
      this.pn_location_v = params['pn_location.v'];
    }
  );

  this.router.navigate(
    ['/clients', {
      page: page,
      'pn_location.e': this.pn_location_e,
      'pn_location.v': this.pn_location_v,
    }]);
}

剥离停用词，从文件中获取停用词

4 个答案: