我有2个文件,一个是.txt文件,其中包含国家/地区名称,另一个是csv文件,其中包含详细信息(文本)。我想从文本csv文件中逐行匹配国家名称,然后计数并打印匹配的单词
我尝试过以下代码:
#NEW!
import csv
import time
#OLD! Import the keywords
f = open('country names.txt', 'r')
allKeywords = f.read().lower().split("\n")
f.close()
#CHANGED! Import the 'Details' column from the CSV file
allTexts = []
fullRow = []
with open('Detail_file.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
#the full row for each entry, which will be used to recreate the improved CSV file in a moment
fullRow.append((row['sr. no.'], row['Details'], row['LOC']))
#the column we want to parse for our keywords
row = row['Details'].lower()
allTexts.append(row)
#NEW! a flag used to keep track of which row is being printed to the CSV file
counter = 0
#NEW! use the current date and time to create a unique output filename
timestr = time.strftime("%Y-%m-%d-(%H-%M-%S)")
filename = 'output-' + str(timestr) + '.csv'
#NEW! Open the new output CSV file to append ('a') rows one at a time.
with open(filename, 'a') as csvfile:
#NEW! define the column headers and write them to the new file
fieldnames = ['sr. no.', 'Details', 'LOC', 'Placename']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
#NEW! define the output for each row and then print to the output csv file
writer = csv.writer(csvfile)
#OLD! this is the same as before, for currentRow in fullRow:
for entry in allTexts:
matches = 0
storedMatches = []
#for each entry:
allWords = entry.split(' ')
for words in allWords:
#if a keyword match is found, store the result.
if words in allKeywords:
if words in storedMatches:
continue
else:
storedMatches.append(words)
matches += 1
#CHANGED! send any matches to a new row of the csv file.
if matches == 0:
newRow = fullRow[counter]
else:
matchTuple = tuple(storedMatches)
newRow = fullRow[counter] + matchTuple
#NEW! write the result of each row to the csv file
writer.writerows([newRow])
counter += 1
它工作正常,输出为 enter image description here
所以我有一个问题,如果我的字典关键字(国家/地区名称)包含一个单词,例如澳大利亚,美国等其良好的表现,但
如果我的词典中的任何关键字包含的词数超过1个,例如新西兰,南非等地不匹配且不计数,因此我有这个问题,因为上面的代码正在逐字匹配,因此,如果我的字典中的任何关键字包含的词多于1个单词(如conatins 2、3),该如何解决此问题,4,..字。 以及我们将在上面的代码中添加解决方案代码的位置。
我想到一个逻辑 如果任何关键字包含一个以上的单词,则在搜索过程中,如果该特定关键字的ist单词匹配,则根据匹配的关键字,从搜索文本中检查下一个单词,如果匹配,则继续,否则继续下一个关键字。
答案 0 :(得分:0)
嗯,要掌握要做什么并不容易。而且我不确定您了解什么是CSV文件。尝试在编辑Python脚本的同一编辑器(非 Excel)中打开它。
无论如何,这是我的尝试:
import csv
import time
with open('country names.txt', 'r') as f:
all_keywords = list(line.lower().rstrip("\n") for line in f)
with open('Detail_file.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
full_rows = [(row['sr. no.'], row['Details'], row['LOC']) for row in reader]
time_string = time.strftime("%Y-%m-%d-(%H-%M-%S)")
filename = 'output-' + time_string + '.csv'
with open(filename, 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['sr. no.', 'Details', 'LOC', 'Placename'])
for input_row in full_rows:
stored_matches_unique = set(x for x in all_keywords if x in input_row[1].lower())
stored_matches = list(stored_matches_unique)
new_row = input_row + stored_matches
writer.writerow(new_row)