我在文件夹和子文件夹中有大约300个docx文件,我需要更新元数据。我有一个单独的300+行csv文件,其中包含元数据:每行包含文件名,关键字,行标题。
我想循环遍历docx文件,从csv中提取内容并将元数据插入docx文件。 Docx文件存储在根文件夹下的2个子文件夹中。
到目前为止,我已经草拟了以下内容。我正在努力解决的问题是如何遍历csv文件并按顺序将元数据应用于每个文件。我确信有一种相对简单的方法来解决这个问题,设置循环并获取csv内容是我迷失的地方。我是一个菜鸟,在我走的路上有这种感觉。
任何提示赞赏。
#running in python 3.5.2 32bit
import csv
from docx import Document
import os
import sys
csv_path = ("datasheet_metadata_uplift.csv")
def update_docx_metadata(document, keywords, title):
"""
Update the *keywords*, and *title* metadata
properties in *document*.
"""
core_properties = document.core_properties
core_properties.keywords = keywords
core_properties.title = title
def read_csv_lines(filename, keywords, title):
"""
Reads the csv lines, returns *filename*, *keywords*, *title*
"""
with open(csv_path, 'r') as f:
csv_file = csv.reader(f)
for row in csv_file:
filename = row[0]
keywords = row[1]
title = row[2]
def open_docx(filename):
"""
Search for docx file and open it
"""
for root, dirs, files in os.walk("."):
if filename in files:
doc_path = os.path.join(path, filename)
csv_lines = read_csv_lines(filename, keywords, title)
for filename, keywords, title in csv_lines:
document = Document(doc_path)
update_doc_metadata(filename, keywords, title)
document.save(doc_path)
答案 0 :(得分:0)
下一步我建议Aidan将您的代码重构为连贯的函数。这将允许您在需要时执行所需的操作,每个操作都有一个函数调用,这样就不会模糊意图和流程。
你可能会从这样的事情开始:
def update_doc_metadata(document, author, keywords, title, subject):
"""
Update the *author*, *keywords*, *title*, and *subject* metadata
properties in *document*.
"""
core_properties = document.core_properties
core_properties.author = author
core_properties.keywords = keywords
core_properties.title = title
core_properties.subject = subject
请注意以下几点:
如果你继续这样做,将相干位定位并“提取”到函数中,主代码的核心逻辑将变得更加清晰。
我认为整体结构是这样的:
csv_lines = read_csv_lines(csv_path)
for filename, keywords, title in csv_lines:
doc_path, document = open_docx(filename)
update_doc_metadata(document, author, keywords, title, subject)
document.save(doc_path)
答案 1 :(得分:0)
所以我想到了这一点,结果很简单。通过将完整的文件路径放在csv中,我也使自己更容易。感谢scanny的鼓励。下一站,文档和教程页面:)
#runs in python 3.5.2 32-bit
#docx requires 32 bit operation
import csv
from docx import Document
import os
import sys
#path to the csv file - csv file must contain rows as follows:
#full filepath, title, subject
#ensure there are no commas, other than the csv delimiters
csv_path = "datasheet_metadata_uplift.csv"
#set up the lists that will be used to hold csv values
filename = []
title = []
keywords = []
#sets up the csv file, and parses the "columns" to one of three lists: filename, title, keywords
f = open(csv_path)
csv_file = csv.reader(f)
#chops up csv into [] lists
for row in csv_file:
filename.append(row[0])
title.append(row[1])
keywords.append(row[2])
#get the number of lines in the csv, and thus the number of files that need updating
file = open(csv_path)
num_lines = len(file.readlines())
#do the updates on every filename in the list
i = 0
while i < num_lines:
if i < num_lines:
#update the docx files, one for each csv file entry
document = Document(filename[i])
core_properties = document.core_properties
core_properties.keywords = (keywords[i])
core_properties.title = (title[i])
core_properties.subject = ("YOUR_SUBJECT_HERE")
core_properties.comments = (" ")
core_properties.company = ("YOUR_COMPANY_HERE")
document.save(filename[i])
i+=1
print ("finished!")