我有两个大.txt
个文件:
第一个文件:
Hi how are you I'm pretty fine.
This is amazing oh yeah nice awesome.
...
I like stackoverflow.
第二档:
hi
this
is
amazing
i
like
it
a
lot
nice
第一个list_1是列表,其中每个列表是一行,第二个列表是第二个文件的列表。我读了两个文件并将它们放在一个列表中,如下所示:
list_1 = [[line.strip()] for line in open('path/first/file.txt')]
f_2 = open ('/path/file2.txt', 'r')
y = f.readlines()
print y
list_1 = [Hi how are you I'm pretty fine. This is amazing oh yeah nice awesome. ... I like stackoverflow.]
list_2 = [hi this is amazing. ... i like it a lot nice]
我想在元组中返回一行(例如列表)和两者中出现的字数(即file1.txt
和file2.txt
)。我怎样才能返回这样的内容:
[(1,count),(2,count),...,(n,count)]
其中n
是行号(列表),count
是来自list_2
的单词出现在list_1
(所有单词)的次数。先谢谢你们!
答案 0 :(得分:2)
from collections import Counter
fh1 = open("/temp/temp1.txt","r")
fh2 = open("/temp/temp2.txt","r")
#you have to decide what a "word" is ...
def text2word(s):
s = s.replace("\n","")
s = s.replace(".","")
return s.lower()
content1 = fh1.read()
counts1 = Counter(map(text2word, content1.split()))
counts2 = list()
for linenumber, word in enumerate(fh2):
word = word.strip()
print word
ct = counts1[word]
counts2.append((linenumber,ct))
print counts2
答案 1 :(得分:1)
假设第二个文件每行包含一个单词,您可以使用以下代码:
with open('/path/file1.txt') as f:
all_words = f.read().split()
with open('/path/file1.txt') as f_2:
words = f_2.read().split()
result = dict((n, all_words.count(w)) for (n, w) in enumerate(words))
print result
如果您需要显示的确切格式,请将最后一行替换为:
print result.items()
或最后两行:
result = [(n, all_words.count(w)) for (n, w) in enumerate(words)]
print result
答案 2 :(得分:1)
如果您希望n
代表第二个文件的行
with open("file2.txt","r") as a, open("file1.txt", "r") as b:
words = dict((k.strip(),[i,0])for i, k in enumerate(a))
b_words = [word.lower().split() for word in b]
for item in b_words:
for word in item:
if words.has_key(word):
words[word][1] += 1
for k,v in words.iteritems():
print k, v
输出:
a [7, 0]
like [5, 1]
this [1, 1]
is [2, 1]
it [6, 0]
i [4, 1]
amazing [3, 1]
hi [0, 1]
lot [8, 0]
nice [9, 1]
现在,如果要根据值
创建元组列表f = [tuple(v) for k,v in words.iteritems()]