Question

我有两个大.txt个文件：

第一个文件：

Hi how are you I'm pretty fine.
This is amazing oh yeah nice awesome.
...
I like stackoverflow.

第二档：

hi 
this 
is 
amazing
i 
like 
it 
a 
lot 
nice

第一个list_1是列表，其中每个列表是一行，第二个列表是第二个文件的列表。我读了两个文件并将它们放在一个列表中，如下所示：

list_1 = [[line.strip()] for line in open('path/first/file.txt')]


f_2 = open ('/path/file2.txt', 'r')
y = f.readlines()
print y

list_1 = [Hi how are you I'm pretty fine. This is amazing oh yeah nice awesome. ... I like stackoverflow.]

list_2 = [hi this is amazing. ... i like it a lot nice]

我想在元组中返回一行（例如列表）和两者中出现的字数（即file1.txt和file2.txt）。我怎样才能返回这样的内容：

[(1,count),(2,count),...,(n,count)]

其中n是行号（列表），count是来自list_2的单词出现在list_1（所有单词）的次数。先谢谢你们！

Answer 1

from collections import Counter
fh1 = open("/temp/temp1.txt","r")
fh2 = open("/temp/temp2.txt","r")

#you have to decide what a "word" is ...
def text2word(s):
  s = s.replace("\n","")
  s = s.replace(".","")
  return s.lower()

content1 = fh1.read()
counts1 = Counter(map(text2word, content1.split()))

counts2 = list()
for linenumber, word in enumerate(fh2):
  word = word.strip()
  print word
  ct = counts1[word]
  counts2.append((linenumber,ct))

print counts2

Answer 2

假设第二个文件每行包含一个单词，您可以使用以下代码：

with open('/path/file1.txt') as f:
    all_words = f.read().split()
with open('/path/file1.txt') as f_2:
    words = f_2.read().split()
result = dict((n, all_words.count(w)) for (n, w) in enumerate(words))
print result

如果您需要显示的确切格式，请将最后一行替换为：

print result.items()

或最后两行：

result = [(n, all_words.count(w)) for (n, w) in enumerate(words)]
print result

Answer 3

如果您希望n代表第二个文件的行

with open("file2.txt","r") as a, open("file1.txt", "r") as b:
words = dict((k.strip(),[i,0])for i, k in enumerate(a))      
b_words = [word.lower().split() for word in b]
for item in b_words:
    for word in item:
        if words.has_key(word):
            words[word][1] += 1          

for k,v in words.iteritems():
    print k, v

输出：

a [7, 0] 
like [5, 1]
this [1, 1]
is [2, 1]
it [6, 0]
i [4, 1]
amazing [3, 1]
hi [0, 1]
lot [8, 0]
nice [9, 1]

现在，如果要根据值

创建元组列表

f = [tuple(v) for k,v in words.iteritems()]

如何返回两个列表中所有重复字符串的计数？

3 个答案: