我有2个文本文件和2个列表(FIRST_LIST,SECOND_LIST),我想分别从FIRST_LIST,SECOND_LIST中找出每个与单词匹配的文件的数量。
FIRST_LIST = "accessorizes","accessorizing","accessorized","accessorize"
SECOND_LIST="accessorize","accessorized","accessorizes","accessorizing"
(这不是字符串,我正在获取此数据“ .txt”文件格式)
text_File1(包含):
This is a very good question, and you have received good answers
which describe interesting topics accessorized accessorize.
text_File2(包含):
is more applied,using accessorize accessorized,accessorizes,accessorizing
输出格式:
File1 first list count=2
File1 second list count=0
File2 first list count=0
File2 second list count=4
下面的这段代码我试图归档此功能,但无法获得预期的输出。 如果有帮助的话
读取所有(x.txt文件)
import os
import glob
files=[]
for filename in glob.glob("*.txt"):
files.append(filename)
创建def函数以删除标点符号
# remove Punctuations
import re
def remove_punctuation(line):
return re.sub(r'[^\w\s]', '', line)
在“循环”中从“文件名”读取多个文件,但正在合并。我需要将每个text1文件计数和text2文件计数分开
two_files=[]
for filename in files:
for line in open(filename):
#two_files.append(remove_punctuation(line))
print(remove_punctuation(line),end='')
two_files.append(remove_punctuation(line))
FIRST_LIST = "accessorizes","accessorizing","accessorized","accessorize"
SECOND_LIST="accessorize","accessorized","accessorizes","accessorizing"
c=[]
for match in FIRST_LIST:
if any(match in value for value in two_files):
#c=match+1
print (match)
c.append(match)
print(c)
len(c)
d=[]
for match in SECOND_LIST:
if any(match in value for value in two_files):
#c=match+1
print (match)
d.append(match)
print(d)
len(d)
答案 0 :(得分:1)
我不确定这是否是您想要的,但是我认为这是因为您要将来自两个文件的行添加到同一列表中。您应该为每个列表创建一个列表。试试:
import glob
files=[]
for filename in glob.glob("*.txt"):
files.append(filename)
# remove Punctuations
import re
def remove_punctuation(line):
return re.sub(r'[^\w\s]', '', line)
two_files=[]
for filename in files:
temp = []
for line in open(filename):
temp.append(remove_punctuation(line))
two_files.append(temp)
FIRST_LIST = "accessorizes","accessorizing","accessorized","accessorize"
SECOND_LIST="accessorize","accessorized","accessorizes","accessorizing"
c=[]
d=[]
for file in two_files:
temp = []
for match in FIRST_LIST:
for value in file:
if match in value:
temp.append(match)
c.append(temp)
temp2 = []
for match in SECOND_LIST:
for value in file:
if match in value:
temp2.append(match)
d.append(temp2)
print('File1 first list count = ' + str(len(c[0])))
print('File1 second list count = ' + str(len(d[0])))
print('File2 first list count = ' + str(len(c[1])))
print('File2 second list count = ' + str(len(d[1])))