我正在创建一个搜索"的功能。独特"文件中的单词

时间:2015-04-17 01:32:53

标签: python

我正在创建一个功能来搜索文件中的唯一字词。这些单词具有偶数个字符,并且在文本中出现不止一次。现在,我遇到了一个错误:

  

NameError:name' strip'未定义

这是我的代码:

    def evenWords(inFile,outFile):
    with open(inFile, 'r') as inF:
        count = 0
        lst = []
        outF = open(outFile, 'w')
            for line in inF:
                line = line.split(" ")
                for word in line:
                    word = word.strip(strip.punctuation)
                    word = word.lower()
                    wordCount = 0
                if (len(word)%2) == 0:
                    lst.append(word)
                if word in lst:
                    wordCount +=1
                if wordCount > 1:
                    outF.write(word + "\n")
                    count +=1
        return count

    inF.close()
    outF.close()

我想知道为什么会这样。我试过导入字符串。

2 个答案:

答案 0 :(得分:2)

我相信你想要string.punctuation,而不是' strip.punctuation'。使用此代码:

def evenWords(inFile,outFile):
with open(inFile, 'r') as inF:
    count = 0
    lst = []
    outF = open(outFile, 'w')
        for line in inF:
            line = line.split(" ")
            for word in line:
                word = word.strip(string.punctuation)
                word = word.lower()
                wordCount = 0
            if (len(word)%2) == 0:
                lst.append(word)
            if word in lst:
                wordCount +=1
            if wordCount > 1:
                outF.write(word + "\n")
                count +=1
    return count

inF.close()
outF.close()

答案 1 :(得分:0)

免责声明:不是真的答案。

我会以不同的风格重写您的代码,只是为了让您感兴趣。您可能会注意到它更容易阅读。而且,推理和测试会容易得多。抱歉,必须使用一些slightly advanced concepts

import string


def normalize(s):
  return s.strip(string.punctuation).strip(string.whitespace).lower()


def isAcceptable(s):
  return len(s) % 2 == 0


def makeWords(stream):
  for line in stream:
    words = line.split(' ')
    for word in words:
      if isAcceptable(word):
        yield normalize(word)


def findDuplicates(words):
  # sets are much faster for member searching.
  seen_words = set()
  # we only want to report a word once.
  reported_words = set()
  for word in words:
    if word in seen_words and word not in reported_words:
      yield word
      reported_words.add(word)
    seen_words.add(word)


def main():
  with open('words.txt') as input_file:
    for word in findDuplicates(makeWords(input_file)):
      print word
  # here input_file closes automatically

BTW它只报告一次重复,而你的代码每次发现时都会报告一个重复的单词:如果你发现一个单词重复了100次,你的代码会报告99次。