我有一个输入文件,如下所示 input file link
并且需要创建一个如下所示的输出文件 output file link
我从这开始,但错误处理和模式匹配正在弄乱逻辑(特别是在URL和数据中出现:)。此外,输出文件中的平均值是非零值或非零值的平均值
with open("input.txt") as f:
next(f) # skips header
for line in f:
cleanline = re.sub('::',':',line) # handles the two :: case
newline = re.split("[\t:]",cleanline) #splits on either tab or :
print newline
x=0
total=0
for i in range(3,7):
if newline[i] <> 0 or newline[i] != None:
x+=1
total+=total
avg=total/x
print avg
答案 0 :(得分:0)
我建议你从不同的角度来看待这个问题。首先,沿着选项卡拆分每一行,然后单独验证每个条目。这允许您为每个条目编译正则表达式并编译更精确的错误消息。一个很好的方法是使用元组解包和拆分方法:
from __future__ import print_function
with open("input.txt") as in_file, open("output.txt", 'w') as out_file:
next(in_file) # skips header
for line in in_file:
error_message = []
# remove line break character and split along the tabs
id_and_date, user_id, p1, p2, p3, p4, url = line.strip("\n").split("\t")
# split the first entry at the first :
split_id_date = id_and_date.split(":", 1)
if len(split_id_date) == 2:
order_id, date = split_id_date
elif len(split_id_date) == 1:
# assume this is the order id
# or do something
order_id, date = (split_id_date[0], "")
error_message.append("Invalid Date")
else:
# set default values if nothing is present
order_id, date = ("", "")
# validate order_id and date here using re.match
# add errors to error_message list:
# error_message.append("Invalid Date")
# calculate average price
# first, compile a list of the non-zero prices
nonzero_prices = [int(x) for x in (p1, p2, p3, p4) if int(x) > 0] # this can be done more efficient
# compute the average price
avg_price = sum(nonzero_prices) / len(nonzero_prices)
# validate url using re here
# handle errors as above
print("\t".join([order_id, date, user_id, str(avg_price), url, ", ".join(error_message)]), file=out_file)
我没有添加re
调用来验证条目,因为我不知道您希望在条目中看到什么。但是,我添加了一条评论,其中对re.match
或类似内容的调用是合理的。
我希望这会有所帮助。