如何返回两个列表中所有重复字符串的计数?

时间:2015-03-05 21:09:52

标签: python string list python-2.7 parsing

我有两个大.txt个文件:

第一个文件:

Hi how are you I'm pretty fine.
This is amazing oh yeah nice awesome.
...
I like stackoverflow.

第二档:

hi 
this 
is 
amazing
i 
like 
it 
a 
lot 
nice

第一个list_1是列表,其中每个列表是一行,第二个列表是第二个文件的列表。我读了两个文件并将它们放在一个列表中,如下所示:

list_1 = [[line.strip()] for line in open('path/first/file.txt')]


f_2 = open ('/path/file2.txt', 'r')
y = f.readlines()
print y

list_1 = [Hi how are you I'm pretty fine. This is amazing oh yeah nice awesome. ... I like stackoverflow.]

list_2 = [hi this is amazing. ... i like it a lot nice]

我想在元组中返回一行(例如列表)和两者中出现的字数(即file1.txtfile2.txt)。我怎样才能返回这样的内容:

[(1,count),(2,count),...,(n,count)]

其中n是行号(列表),count是来自list_2的单词出现在list_1(所有单词)的次数。先谢谢你们!

3 个答案:

答案 0 :(得分:2)

from collections import Counter
fh1 = open("/temp/temp1.txt","r")
fh2 = open("/temp/temp2.txt","r")

#you have to decide what a "word" is ...
def text2word(s):
  s = s.replace("\n","")
  s = s.replace(".","")
  return s.lower()

content1 = fh1.read()
counts1 = Counter(map(text2word, content1.split()))

counts2 = list()
for linenumber, word in enumerate(fh2):
  word = word.strip()
  print word
  ct = counts1[word]
  counts2.append((linenumber,ct))

print counts2

答案 1 :(得分:1)

假设第二个文件每行包含一个单词,您可以使用以下代码:

with open('/path/file1.txt') as f:
    all_words = f.read().split()
with open('/path/file1.txt') as f_2:
    words = f_2.read().split()
result = dict((n, all_words.count(w)) for (n, w) in enumerate(words))
print result

如果您需要显示的确切格式,请将最后一行替换为:

print result.items()

或最后两行:

result = [(n, all_words.count(w)) for (n, w) in enumerate(words)]
print result

答案 2 :(得分:1)

如果您希望n代表第二个文件的行

with open("file2.txt","r") as a, open("file1.txt", "r") as b:
words = dict((k.strip(),[i,0])for i, k in enumerate(a))      
b_words = [word.lower().split() for word in b]
for item in b_words:
    for word in item:
        if words.has_key(word):
            words[word][1] += 1          

for k,v in words.iteritems():
    print k, v

输出:

a [7, 0] 
like [5, 1]
this [1, 1]
is [2, 1]
it [6, 0]
i [4, 1]
amazing [3, 1]
hi [0, 1]
lot [8, 0]
nice [9, 1]

现在,如果要根据值

创建元组列表
f = [tuple(v) for k,v in words.iteritems()]