如何在文件中找到加倍的单词?

时间:2017-04-03 13:35:22

标签: python-3.x

我遇到了一些代码问题。我试图在文件中找到重复的单词,例如“the”,然后打印它发生的行。到目前为止,我的代码适用于行计数,但是给了我在整个文件中重复的所有单词,而不仅仅是那些正在重复的单词。

我需要更改哪些内容才会计算加倍的字数?

my_file = input("Enter file name: ")
lst = []
count = 1
with open(my_file, "r") as dup:
for line in dup:
    linedata = line.split()
    for word in linedata:
        if word not in lst:
            lst.append(word)
        else:
           print("Found word: {""} on line {}".format(word, count))
           count = count + 1
dup.close()

3 个答案:

答案 0 :(得分:1)

my_file = input("Enter file name: ")
with open(my_file, "r") as dup:
    for line_num, line in enumerate(dup):
        words_in_line = line.split()
        duplicates = [word for i, word in enumerate(words_in_line[1:]) if words_in_line[i] == word]
        # now you have a list of duplicated words in line in duplicates
        # do whatever you want with it

答案 1 :(得分:0)

将下面的代码放在名为THISfile.py的文件中并执行它以查看它的作用:

# myFile = input("Enter file name: ")
# line No 2: line with with double 'with'
# line No 3: double ( word , word ) is not a double word
myFile="THISfile.py"
lstUniqueWords = []
noOfFoundWordDoubles = 0
totalNoOfWords       = 0
lineNo               = 0
lstLineNumbersWithWordDoubles = []
with open(myFile, "r") as myFile:
    for line in myFile:
        lineNo+=1 # memorize current line number 
        lineWords = line.split()
        if len(lineWords) > 0: # scan line only if it contains words
            currWord = lineWords[0] # remember already 'visited' word
            totalNoOfWords += 1
            if currWord not in lstUniqueWords: 
                lstUniqueWords.append(currWord) 
                # put 'visited' word word into lstAllWordsINmyFile (if it is not already there)
            lastWord = currWord # we are done with current, so current becomes last one
            if len(lineWords) > 1 : # proceed only if line has two or more words
                for word in lineWords[1:] : # loop over all other words
                    totalNoOfWords += 1
                    currWord = word
                    if currWord not in lstUniqueWords: 
                        lstUniqueWords.append(currWord) 
                        # put 'visited' word into lstAllWordsINmyFile (if it is not already there)
                    if( currWord == lastWord ): # duplicate word found: 
                        noOfFoundWordDoubles += 1
                        print("Found double word: ['{""}'] in line {}".format(currWord, lineNo))
                        lstLineNumbersWithWordDoubles.append(lineNo)
                    lastWord = currWord 
                    #        ^--- now after all all work is done, the currWord is considered lastWord
print(
    "noOfDoubles", noOfFoundWordDoubles, "\n",
    "totalNoOfWords", totalNoOfWords, "uniqueWords", len(lstUniqueWords), "\n",
    "linesWithDoubles", lstLineNumbersWithWordDoubles
)

输出应为:

Found double word: ['with'] in line 2
Found double word: ['word'] in line 19
Found double word: ['all'] in line 33
noOfDoubles 3 
 totalNoOfWords 221 uniqueWords 111 
 linesWithDoubles [2, 19, 33]

现在,您可以查看代码中的注释,以便更好地了解其工作原理。

答案 2 :(得分:0)

这里只提出问题的纯粹答案:

"我需要更改哪些内容才能计算加倍的字数?"

你在这里:

wmic /node:brspd030 computersystem get caption >>\\brspd010\c$\users\machael1\desktop\gpresult.txt & psexec \\brspd030 gpresult -r | findstr /i "WSUS" >>\\brspd010\c$\users\machael1\desktop\gpresult.txt