如何摆脱此文件中的标点符号

时间:2015-11-10 03:51:52

标签: python

我想打印出一个没有标点符号的索引!我不知道什么功能允许我的程序在我的单词结尾添加标点符号。我的文件还打印出我们正在打印的文本文件中可以找到这些单词的行号。

    def makeIndex(filename):
        wordIndex = {}
        with open(filename) as f:
            lineNum = 1
            for line in f:
                words = line.lower().split()
                for word in words:
                    if word in wordIndex.keys():
                        if lineNum not in wordIndex[word]:
                            wordIndex[word].append(lineNum)
                    else:
                        wordIndex[word] = [lineNum]
                lineNum += 1
        return wordIndex

    def output(wordIndex):
        print("Word\tLine Numbers")
        for key in sorted(wordIndex.keys()):
            print(key, '\t', end=" ")
            for lineNum in wordIndex[key]:
                print(lineNum, end=" ")
            print()

def main():
    filename = input("What is the file name to be indexed?")
    index = makeIndex(filename)
    output(index)

main()的

Output:
What is the file name to be indexed?test.txt
Word    Line Numbers
a    1 3 8 
all      9 10 
also     9 
an   3 10 
anagrams,    9 
anagrams.    10 
as   9 
ask      3 
blocks   1 
called   7 
create   8 
different    7 
difficulties     6 
each     8 
employed     7 
figure   3 
file     1 
find     10 
finds    9 
following    2 
for      2 8 
given    3 
has      8 
have     6 
here     7 
in   1 6 
interesting      2 
is   6 7 
it   10 
its      4 9 
jumble   2 
large    1 
letters.     4 
long     6 
many     6 
new      1 
of   1 4 6 9 10 
one      6 
opens    1 
out      3 
permutations.    7 9 
possibilities    2 
problem      6 
program      2 9 
programs.    2 
puzzles,     3 
range    1 
reorderings,     7 
same     8 
scrambled    3 
set      4 
signature    8 
since    9 
so   6 8 
solver   3 
solves   2 
solving      6 
strategy     7 
text     1 
that     6 8 
the      2 3 6 7 8 
this     6 9 
to   3 7 
typing   9 
unique   8 
unknown      3 
unscrambled      10 
up   1 
which    3 
whole    1 
will     10 
with     1 
word     3 8 10 
word,    8 
words    6 
working 

1 个答案:

答案 0 :(得分:1)

在构建wordIndex词典之前,您应该删除标点符号。

e.g。

from string import punctuation

...
for word in words:
  for char in punctuation:
    word = word.replace(char, '')
  if word in wordIndex.keys():