如何在Python中循环读取多个.txt文件并获取匹配单词数

时间:2018-08-12 09:25:01

标签: python

我有2个文本文件和2个列表(FIRST_LIST,SECOND_LIST),我想分别从FIRST_LIST,SECOND_LIST中找出每个与单词匹配的文件的数量。

FIRST_LIST = "accessorizes","accessorizing","accessorized","accessorize"
SECOND_LIST="accessorize","accessorized","accessorizes","accessorizing"

(这不是字符串,我正在获取此数据“ .txt”文件格式)

text_File1(包含):

This is a very good question, and you have received good answers 
 which describe interesting topics accessorized accessorize.

text_File2(包含):

is more applied,using accessorize accessorized,accessorizes,accessorizing

输出格式:

File1 first list count=2
File1 second list count=0

File2 first list count=0
File2 second list count=4

下面的这段代码我试图归档此功能,但无法获得预期的输出。 如果有帮助的话

读取所有(x.txt文件)

import os 
import glob
files=[]

for filename in glob.glob("*.txt"):
    files.append(filename)

创建def函数以删除标点符号

# remove Punctuations
import re

def remove_punctuation(line):
return re.sub(r'[^\w\s]', '', line)

在“循环”中从“文件名”读取多个文件,但正在合并。我需要将每个text1文件计数和text2文件计数分开

two_files=[]
for filename in files:
    for line in open(filename):
        #two_files.append(remove_punctuation(line))
        print(remove_punctuation(line),end='')
        two_files.append(remove_punctuation(line))


FIRST_LIST = "accessorizes","accessorizing","accessorized","accessorize"
SECOND_LIST="accessorize","accessorized","accessorizes","accessorizing"


c=[]
for match in FIRST_LIST:
    if any(match in value for value in two_files):
        #c=match+1
        print (match)
        c.append(match)
print(c)
len(c)
d=[]
for match in SECOND_LIST:
    if any(match in value for value in two_files):
        #c=match+1
        print (match)
        d.append(match)
print(d)
len(d)

1 个答案:

答案 0 :(得分:1)

我不确定这是否是您想要的,但是我认为这是因为您要将来自两个文件的行添加到同一列表中。您应该为每个列表创建一个列表。试试:

import glob
files=[]

for filename in glob.glob("*.txt"):
    files.append(filename)

# remove Punctuations
import re

def remove_punctuation(line):
    return re.sub(r'[^\w\s]', '', line)


two_files=[]
for filename in files:
    temp = []
    for line in open(filename):
        temp.append(remove_punctuation(line))
    two_files.append(temp)

FIRST_LIST = "accessorizes","accessorizing","accessorized","accessorize"
SECOND_LIST="accessorize","accessorized","accessorizes","accessorizing"


c=[]
d=[]

for file in two_files:
    temp = []
    for match in FIRST_LIST:
        for value in file:
            if match in value:
                temp.append(match)
    c.append(temp)

    temp2 = []
    for match in SECOND_LIST:
        for value in file:
            if match in value:
                temp2.append(match)
    d.append(temp2)

print('File1 first list count = ' + str(len(c[0])))
print('File1 second list count = ' + str(len(d[0])))

print('File2 first list count = ' + str(len(c[1])))
print('File2 second list count = ' + str(len(d[1])))