好的,我有一个交易文件:
IN CU
Customer_ID=
Last_Name=Johnston
First_Name=Karen
Street_Address=291 Stone Cr
City=Toronto
//
IN VE
License_Plate#=LSR976
Make=Cadillac
Model=Seville
Year=1996
Owner_ID=779
//
IN SE
Vehicle_ID=LSR976
Service_Code=461
Date_Scheduled=00/12/19
IN
表示插入,而CU
(表示客户)指的是我们正在编写的文件,在本例中为customer.diff
。我遇到的问题是我需要遍历每一行,并检查每个字段的值(Customer_ID
)。您看到Customer_ID
如何留空?我需要将值0
替换为任何数字空白字段,例如Customer_ID=0
。这是我到目前为止所做的一切,但没有任何改变:
def insertion():
field_names = {'Customer_ID=': 'Customer_ID=0',
'Home_Phone=':'Home_Phone=0','Business_Phone=': 'Business_Phone=0'}
with open('xactions.two.txt', 'r') as from_file:
search_lines = from_file.readlines()
if search_lines[3:5] == 'CU':
for i in search_lines:
if field_names[i] == True:
with open('customer.diff', 'w') as to_file:
to_file.write(field_names[i])
由于
答案 0 :(得分:2)
为什么不尝试一些更简单的东西?我还没有测试过这段代码。
def insertion():
field_names = {'Customer_ID=': 'Customer_ID=0',
'Home_Phone=':'Home_Phone=0','Business_Phone=': 'Business_Phone=0'}
with open('xactions.two.txt', 'r') as from_file:
with open('customer.diff', 'w') as to_file:
for line in from_file:
line = line.rstrip("\n")
found = False
for field in field_names.keys():
if field in line:
to_file.write(line + "0")
found = True
if not found:
to_file.write(line)
to_file.write("\n")
答案 1 :(得分:1)
这是一个相当全面的方法;它有点长,但没有它看起来那么复杂!
我假设Python 3.x,虽然它应该在Python 2.x中工作,几乎没有变化。我广泛使用生成器来传输数据,而不是将其保存在内存中。
首先:我们将为每个字段定义预期的数据类型。某些字段与内置Python数据类型不对应,因此我首先为这些字段定义一些自定义数据类型:
import time
class Date:
def __init__(self, s):
"""
Parse a date provided as "yy/mm/dd"
"""
if s.strip():
self.date = time.strptime(s, "%y/%m/%d")
else:
self.date = time.gmtime(0.)
def __str__(self):
"""
Return a date as "yy/mm/dd"
"""
return time.strftime("%y/%m/%d", self.date)
def Int(s):
"""
Parse a string to integer ("" => 0)
"""
if s.strip():
return int(s)
else:
return 0
class Year:
def __init__(self, s):
"""
Parse a year provided as "yyyy"
"""
if s.strip():
self.date = time.strptime(s, "%Y")
else:
self.date = time.gmtime(0.)
def __str__(self):
"""
Return a year as "yyyy"
"""
return time.strftime("%Y", self.date)
现在我们设置一个表,定义每个字段应该是什么类型:
# Expected data-type of each field:
# data_types[section][field] = type
data_types = {
"CU": {
"Customer_ID": Int,
"Last_Name": str,
"First_Name": str,
"Street_Address": str,
"City": str
},
"VE": {
"License_Plate#": str,
"Make": str,
"Model": str,
"Year": Year,
"Owner_ID": Int
},
"SE": {
"Vehicle_ID": str,
"Service_Code": Int,
"Date_Scheduled": Date
}
}
我们解析输入文件;这是迄今为止最复杂的一点!它是一个有限状态机,实现为生成器函数,一次产生一个部分:
# Customized error-handling
class TransactionError (BaseException): pass
class EntryNotInSectionError (TransactionError): pass
class MalformedLineError (TransactionError): pass
class SectionNotTerminatedError(TransactionError): pass
class UnknownFieldError (TransactionError): pass
class UnknownSectionError (TransactionError): pass
def read_transactions(fname):
"""
Read a transaction file
Return a series of ("section", {"key": "value"})
"""
section, accum = None, {}
with open(fname) as inf:
for line_no, line in enumerate(inf, 1):
line = line.strip()
if not line:
# blank line - skip it
pass
elif line == "//":
# end of section - return any accumulated data
if accum:
yield (section, accum)
section, accum = None, {}
elif line[:3] == "IN ":
# start of section
if accum:
raise SectionNotTerminatedError(
"Line {}: Preceding {} section was not terminated"
.format(line_no, section)
)
else:
section = line[3:].strip()
if section not in data_types:
raise UnknownSectionError(
"Line {}: Unknown section type {}"
.format(line_no, section)
)
else:
# data entry: "key=value"
if section is None:
raise EntryNotInSectionError(
"Line {}: '{}' should be in a section"
.format(line_no, line)
)
pair = line.split("=")
if len(pair) != 2:
raise MalformedLineError(
"Line {}: '{}' could not be parsed as a key/value pair"
.format(line_no, line)
)
key,val = pair
if key not in data_types[section]:
raise UnknownFieldError(
"Line {}: unrecognized field name {} in section {}"
.format(line_no, key, section)
)
accum[key] = val.strip()
# end of file - nothing should be left over
if accum:
raise SectionNotTerminatedError(
"End of file: Preceding {} section was not terminated"
.format(line_no, section)
)
现在读取文件,其余部分更容易。我们使用上面定义的查找表对每个字段进行类型转换:
def format_field(section, key, value):
"""
Cast a field value to the appropriate data type
"""
return data_types[section][key](value)
def format_section(section, accum):
"""
Cast all values in a section to the appropriate data types
"""
return (section, {key:format_field(section, key, value) for key,value in accum.items()})
并将结果写回文件:
def write_transactions(fname, transactions):
with open(fname, "w") as outf:
for section,accum in transactions:
# start section
outf.write("IN {}\n".format(section))
# write key/value pairs in order by key
keys = sorted(accum.keys())
for key in keys:
outf.write(" {}={}\n".format(key, accum[key]))
# end section
outf.write("//\n")
所有机器都已到位;我们只需要称之为:
def main():
INPUT = "transaction.txt"
OUTPUT = "customer.diff"
transactions = read_transactions(INPUT)
cleaned_transactions = (format_section(section, accum) for section,accum in transactions)
write_transactions(OUTPUT, cleaned_transactions)
if __name__=="__main__":
main()
希望有所帮助!