查询关键字并将其写入新列

时间:2018-11-14 20:57:25

标签: python

我有一个脚本,该脚本是在一些在线资源的帮助下编写的,但是并不能完全满足我的需要。我正在使用一个现有的csv文件,在该文件中的4个文本字段中查询一组关键字,然后将结果写入新的CSV文件中。现在,脚本仅查看任何给定的文本字段,并且比在文本文件中找到任何关键字时,它将每个关键字写入该记录的单独列中。我需要能够用脚本做两件事。

1)查询多个文本字段。因此,我需要查询所有这些关键字的“ essay1”,“ essay2”,“ essay3”,“ essay4”,而不仅仅是“ essay1”。

2)如果脚本找到了那些关键字中的任何一个,则它应该将找到的关键字写入该特定记录的一列而不是单个列中。

import csv
import time

#Read data from word enrichment
f = open('word_enrichment_approved.txt', 'r')
allKeywords = f.read().lower().split("\n")
f.close()

#Read in raw data from csv file
#Import the 'essay' column from the CSV file
allTexts = []
fullRow = []
with open('Join_train_and_resources_edited.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        #the full row for each entry, which will be used to recreate the improved CSV file 
        fullRow.append((row['record_id'],row['teacher_id'],row['teacher_prefix'],row['school_state'],row['project_submitted_datetime'],
                        row['project_grade_category'], row['project_subject_categories'], row['project_subject_subcategories'],
                        row['project_title'],row['project_essay_1'],row['project_essay_2'],row['project_essay_3'],row['project_essay_4'],
                        row['project_resource_summary'],row['teacher_number_of_previously_posted_projects'],row['project_is_approved'],
                        row['id-1'],row['quantity'],row['price']))

        #the column we want to parse for our keywords (change to mactch actual name of the column)
        row = row['project_essay_1'].lower()
        allTexts.append(row)
#NEW! a flag used to keep track of which row is being printed to the CSV file   
counter = 0
#NEW! use the current date and time to create a unique output filename
timestr = time.strftime("%Y-%m-%d-(%H-%M-%S)")
filename = 'output-' + str(timestr) + '.csv'

#NEW! Open the new output CSV file to append ('a') rows one at a time.
with open(filename, 'a') as csvfile:

    #NEW! define the column headers and write them to the new file
    fieldnames = ['id', 'teacher_id','teacher_prefix    ','school_state','project_submitted_datetime','project_grade_category',
                  'project_subject_categories','project_subject_subcategories','project_title','project_essay_1','project_essay_2',
                  'project_essay_3','project_essay_4','project_resource_summary','teacher_number_of_previously_posted_projects',
                  'project_is_approved','id-1','quantity','price']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()

    #NEW! define the output for each row and then print to the output csv file
    writer = csv.writer(csvfile)

    for entry in allTexts:

        matches = 0
        storedMatches = []

        #for each entry:
        allWords = entry.split(' ')
        for words in allWords:

            #remove punctuation that will interfere with matching
            words = words.replace(',', '')
            words = words.replace('.', '')
            words = words.replace(';', '')

            #if a keyword match is found, store the result.
            if words in allKeywords:
                if words in storedMatches:
                    continue
                else:
                    storedMatches.append(words)
                matches += 1

        #CHANGED! send any matches to a new row of the csv file.
        if matches == 0:
            newRow = fullRow[counter]
        else:
            matchTuple = tuple(storedMatches)
            newRow = fullRow[counter] + matchTuple

        #NEW! write the result of each row to the csv file
        writer.writerows([newRow])
        counter += 1

0 个答案:

没有答案