我正在尝试从文本文件中提取原始数据,在处理原始数据之后,我想将其导出到另一个文本文件。下面是我为此过程编写的python代码。为此,我在python 3中使用“petl”包。 'locations.txt'是原始数据文件。
import glob, os
from petl import *
class ETL():
def __init__(self, input):
self.list = input
def parse_P(self):
personids = None
for term in self.list:
if term.startswith('P'):
personids = term[1:]
personid = personids.split(',')
return personid
def return_location(self):
location = None
for term in self.list:
if term.startswith('L'):
location = term[1:]
return location
def return_location_id(self, location):
location = self.return_location()
locationid = None
def return_country_id(self):
countryid = None
for term in self.list:
if term.startswith('C'):
countryid = term[1:]
return countryid
def return_region_id(self):
regionid = None
for term in self.list:
if term.startswith('R'):
regionid = term[1:]
return regionid
def return_city_id(self):
cityid = None
for term in self.list:
if term.startswith('I'):
cityid = term[1:]
return cityid
print (os.getcwd())
os.chdir("D:\ETL-IntroductionProject")
print (os.getcwd())
final_location = [['L','P', 'C', 'R', 'I']]
new_location = fromtext('locations.txt', encoding= 'Latin-1')
stored_list = []
for identifier in new_location:
if identifier[0].startswith('L'):
identifier = identifier[0]
info_list = identifier.split('_')
stored_list.append(info_list)
for lst in stored_list:
tabling = ETL(lst)
location = tabling.return_location()
country = tabling.return_country_id()
city = tabling.return_city_id()
region = tabling.return_region_id()
person_list = tabling.parse_P()
for person in person_list:
table_new = [location, person, country, region, city]
final_location.append(table_new)
totext(final_location, 'l1.txt')
然而,当我使用petl的“totext”函数时,它会抛出“断言错误”。
AssertionError:模板是必需的 我无法理解故障是什么。有人可以解释一下我面临的问题和我应该做的事情吗?
答案 0 :(得分:0)
toext函数的模板参数不是可选的,在这种情况下没有用于如何写行的默认格式,您必须提供模板。在此处查看文档forext,以获取示例:https://petl.readthedocs.io/en/latest/io.html#text-files
模板描述了使用字段标题来描述事物的每一行的格式,您也可以选择传入序言来编写标题。您的案例中的基本模板是:
table_new_template = "{L} {P} {C} {R} {I}"
totext(final_location, 'l1.txt', template=table_new_template)