我有一个脚本,可以根据输入文件预测产品名称。代码如下:
output_dir = "C:\\Users\\Lenovo\\.spyder-py3\\NER_training"
DIR = 'C:\\Users\\Lenovo\\.spyder-py3\\Testing\\'
print("Loading from", output_dir)
nlp2 = spacy.load(output_dir)
with open('eng_productnames.csv', newline='') as myFile:
reader = csv.reader(myFile)
for rowz in reader:
try:
filenamez = rowz[1]
file = open(DIR+filenamez, "r", encoding ='utf-8')
filecontentszz = file.read()
for s in filecontentszz:
filecontentszz = re.sub(r'\s+', ' ', filecontentszz)
#filecontents = filecontents.encode().decode('unicode-escape')
filecontentszz = ''.join([line.lower() for line in filecontentszz])
doc2 = nlp2(filecontentszz)
for ent in doc2.ents:
print(filenamez, ent.label_, ent.text)
break
except Exception as e:`
以字符串形式给我输出:
07-09-18 N021024s16PASBUNDLEACK - Acknowledgement P.txt PRODUCT ABC1
06-22-18 Letter from Supl.txt PRODUCT ABC2
06-22-18 Letter from Req to Change .txt PRODUCT ABC3
现在,我想将所有这些详细信息导出到具有2列的CSV中,其中一列为FILENAME,另一列为PRODUCT,其中PRODUCT在相应的列名称下具有所有文件名和产品名称。所有产品名称均以PRODUCT开头,然后为字符串中的名称。我该如何解决:
输出csv应该如下:
Filename PRODUCT
07-09-18 Acknowledgement P.txt ABC1
06-22-18 Letter Req to Change.txt ABC2
答案 0 :(得分:1)
您可以使用csv.writer
,而不是打印到屏幕,使writerow
写入每一行到输出文件。
output_dir = "C:\\Users\\Lenovo\\.spyder-py3\\NER_training"
DIR = 'C:\\Users\\Lenovo\\.spyder-py3\\Testing\\'
print("Loading from", output_dir)
nlp2 = spacy.load(output_dir)
with open('eng_productnames.csv', newline='') as input_file, \
open('output.csv', 'w') as output_file:
reader = csv.reader(input_file)
writer = csv.writer(output_file)
writer.writerow(["Filename", "Product"]) # this is the header row
for rowz in reader:
try:
filenamez = rowz[1]
file = open(DIR+filenamez, "r", encoding ='utf-8')
filecontentszz = file.read()
for s in filecontentszz:
filecontentszz = re.sub(r'\s+', ' ', filecontentszz)
#filecontents = filecontents.encode().decode('unicode-escape')
filecontentszz = ''.join([line.lower() for line in filecontentszz])
doc2 = nlp2(filecontentszz)
for ent in doc2.ents:
writer.writerow([filenamez, ent.text])
break
我在这里假设filenamez
和ent.text
在每一列中包含您想要的信息。如果不是这种情况,则可以在写入CSV之前操纵它们以获取所需的内容。
答案 1 :(得分:0)
有很多方法可以实现这一目标。我更喜欢使用的是Pandas,这是一个功能强大的库,可以处理CSV文件。 您可以创建字典:
predicted_products = {'FILENAME': [], 'PRODUCT': []}
并将文件名和产品迭代地附加到相应的列表中。
完成此操作后,将预测产品转换为DataFrame,然后调用to_csv函数:
import Pandas as pd
predicted_products_df = pd.DataFrame.from_dict(predicted_products)
predicted_products_df.to_csv('your_path/file_name.csv')
我更喜欢这种方式,因为在保存文件之前,您可以更轻松地编辑数据。
对于您现有的代码,我想print(filenamez, ent.label_, ent.text)
打印输出。如果是这样,则:
import Pandas as pd
output_dir = "C:\\Users\\Lenovo\\.spyder-py3\\NER_training"
DIR = 'C:\\Users\\Lenovo\\.spyder-py3\\Testing\\'
print("Loading from", output_dir)
nlp2 = spacy.load(output_dir)
predicted_products = {'FILENAME': [], 'PRODUCT': []}
with open('eng_productnames.csv', newline='') as myFile:
reader = csv.reader(myFile)
for rowz in reader:
try:
filenamez = rowz[1]
file = open(DIR+filenamez, "r", encoding ='utf-8')
filecontentszz = file.read()
for s in filecontentszz:
filecontentszz = re.sub(r'\s+', ' ', filecontentszz)
#filecontents = filecontents.encode().decode('unicode-escape')
filecontentszz = ''.join([line.lower() for line in filecontentszz])
doc2 = nlp2(filecontentszz)
for ent in doc2.ents:
print(filenamez, ent.label_, ent.text)
predicted_products['FILENAME'].append(filenamez + ' ' + ent.label_)
predicted_products['PRODUCT'].append(ent.text)
break
except Exception as e:
predicted_products_df = pd.DataFrame.from_dict(predicted_products)
predicted_products_df.to_csv('your_path/file_name.csv')