我有两个文本文件和2个列表(FIRST_LIST,SCND_LIST),我想分别从FIRST_LIST,SCND_LIST中找出与单词匹配的每个文件的计数。
FIRST_LIST =
"accessorizes","accessorizing","accessorized","accessorize"
SCND_LIST =
"accessorize","accessorized","accessorizes","accessorizing"
文本File1包含:
这是一个非常好的问题,您已经收到了很好的答案,它们描述了有趣的主题。
文本File2包含:
应用更广泛,使用accessorize accessorized,accessorizes,accessorizing
输出
File1 first list count=2
File1 second list count=0
File2 first list count=0
File2 second list count=4
此代码我已尝试实现此功能,但无法获得预期的输出。 如果有帮助的话
import os
import glob
files=[]
for filename in glob.glob("*.txt"):
files.append(filename)
# remove Punctuations
import re
def remove_punctuation(line):
return re.sub(r'[^\w\s]', '', line)
two_files=[]
for filename in files:
for line in open(filename):
#two_files.append(remove_punctuation(line))
print(remove_punctuation(line),end='')
two_files.append(remove_punctuation(line))
FIRST_LIST = "accessorizes","accessorizing","accessorized","accessorize"
SCND_LIST="accessorize","accessorized","accessorizes","accessorizing"
c=[]
for match in FIRST_LIST:
if any(match in value for value in two_files):
#c=match+1
print (match)
c.append(match)
print(c)
len(c)
d=[]
for match in SCND_LIST:
if any(match in value for value in two_files):
#c=match+1
print (match)
d.append(match)
print(d)
len(d)
答案 0 :(得分:2)
使用Counter
和一些列表理解是解决问题的许多不同方法之一。
我认为,您的示例输出是错误的,因为某些单词是两个列表和两个文件的一部分,但没有计算在内。此外,我在示例字符串中添加了第二行,以显示该示例如何与多行字符串(可能是给定文件的典型内容)一起工作。
io.StringIO
对象可以模拟您的文件,但是使用文件系统中的真实文件可以完全相同,因为两者都提供了类似文件的对象或类似文件的界面:
from collections import Counter
list_a = ["accessorizes", "accessorizing", "accessorized", "accessorize"]
list_b = ["accessorize", "accessorized", "accessorizes", "accessorizing"]
# added a second line to each string just for the sake
file_contents_a = 'This is a very good question, and you have received good answers which describe interesting topics accessorized accessorize.\nThis is the second line in file a'
file_contents_b = 'is more applied,using accessorize accessorized,accessorizes,accessorizing\nThis is the second line in file b'
# using io.StringIO to simulate a file input (--> file-like object)
# you should use `with open(filename) as ...` for real file input
file_like_a = io.StringIO(file_contents_a)
file_like_b = io.StringIO(file_contents_b)
# read file contents and split lines into a list of strings
lines_of_file_a = file_like_a.read().splitlines()
lines_of_file_b = file_like_b.read().splitlines()
# iterate through all lines of each file (for file a here)
for line_number, line in enumerate(lines_of_file_a):
words = line.replace('.', ' ').replace(',', ' ').split(' ')
c = Counter(words)
in_list_a = sum([v for k,v in c.items() if k in list_a])
in_list_b = sum([v for k,v in c.items() if k in list_b])
print("Line {}".format(line_number))
print("- in list a {}".format(in_list_a))
print("- in list b {}".format(in_list_b))
# iterate through all lines of each file (for file b here)
for line_number, line in enumerate(lines_of_file_b):
words = line.replace('.', ' ').replace(',', ' ').split(' ')
c = Counter(words)
in_list_a = sum([v for k,v in c.items() if k in list_a])
in_list_b = sum([v for k,v in c.items() if k in list_b])
print("Line {}".format(line_number))
print("- in list a {}".format(in_list_a))
print("- in list b {}".format(in_list_b))
# actually, your two lists are the same
lists_are_equal = sorted(list_a) == sorted(list_b)
print(lists_are_equal)