如果以前有人问过这个问题,我很抱歉,但我对Python很新。我有一个包含类似于以下数据记录的文件;
k; 0; 710; 85; 2; 2013:12:04:13:11:36.291; 0.0000; 1; 1009.3000; 0;
k; 0; 710; 85; 3; 2013:12:04:13:11:36.291; 0.0000; 1; 1009.3000; 0;
k; 17; 718; 86; 1; 2013:12:04:13:11:36.198; 995.6880; 4; 0.0000; 0; 0.0000; 280; 0.0000; 576; 0.0000; 904;
k; 17; 718; 86; 2; 2013:12:04:13:11:36.198; 0.0000; 4; 1484.0000; 0; 1484.0000; 280; 1484.0000; 576; 1481.6000; 904;
记录的长度各不相同,但我只对每条记录中的前八项感兴趣。每条记录中的项目用“;”分隔。字符和不同数量的空格字符。 当我读取文件时,我想将每一行分配给一个列表,但我还想在列表中定义具有正确类型的项目,例如str,int,int,int,int,datetime,float,int等。目前我使用以下代码;
def file_extract(pathfile):
file = open(pathfile)
contents = file.read()
# remove spaces and split data based on ';' and \n
data_list = [lines.replace(" ","").split(";") for lines in contents.split("\n")]
for line in data_list:
if line[0] == "K":
listraw=line[:9]
listraw[1]=int(line[1])
listraw[2]=int(line[2])
# continue setting types in the listraw[] etc. etc.
不幸的是,当我将文件内容中的每条记录读入列表时,列表中的所有项目都会自动分配给类似于以下内容的字符串值;
'K''0''710''85''2''2013:12:04:13:11:36.291'......
然后,我必须遍历列表中的每个单独项目以根据需要设置类型。是否有更优雅的方式在列表中设置各个类型?
答案 0 :(得分:0)
您可以将数据类型放在列表中,然后使用zip
将它们与字段匹配。像这样:
import datetime
# write a parser for the timepoints
def dateparser(string):
# guessed the dateformat
return datetime.datetime.strptime(string, '%Y:%m:%d:%H:%M:%S.%f')
# From your code `if line[0] == 'K'` I assume that 'K' is a key for the
# datatypes in the corresponding row.
# For every rowtype you define the datatypes here, where datatype
# is equivalent to a parser. Just make sure it accepts a string and returns the
# type you need.
# I guessed the types here so it works with your example.
parsers = {'K': [str,int,int,int,int,dateparser,float,int,float]}
# the example content
contents = """K; 0; 710; 85; 2; 2013:12:04:13:11:36.291; 0.0000; 1;1009.3000; 0;
K; 0; 710; 85; 3; 2013:12:04:13:11:36.291; 0.0000; 1;1009.3000; 0;
K; 17; 718; 86; 1; 2013:12:04:13:11:36.198; 995.6880; 4; 0.0000; 0; 0.0000; 280; 0.0000; 576; 0.0000; 904;
K; 17; 718; 86; 2; 2013:12:04:13:11:36.198; 0.0000; 4;1484.0000; 0;1484.0000; 280;1484.0000; 576;1481.6000; 904; """
data = []
# the right way for doing this with a file would be:
# with open(filepath, 'r') as f:
# for line in f:
for line in contents.split('\n'):
# skip empty lines
if not line.strip():
continue
# first split then strip, feels safer this way...
fields = [f.strip() for f in line.split(';')]
# select the parserlist from our dict
parser_list = parsers[fields[0]]
# Now match the fields with their parsers, it will automatically stop
# when there is no parser left. This means if you have 8 parsers only 8
# fields will be evaluated and the rest is ignored.
# Comes in handy when the lengths of your row types differ.
# However it this also goes the other way around. If there
# are less fields than parsers, the last parsers will be
# ignored. If you don't want this to happen you have to
# make sure that len(fields) >= len(parser_list)
data.append([parser(field) for parser, field in zip(parser_list, fields)])
for row in data:
print(row)
打印:
['K', 0, 710, 85, 2, datetime.datetime(2013, 12, 4, 13, 11, 36, 291000), 0.0, 1, 1009.3]
['K', 0, 710, 85, 3, datetime.datetime(2013, 12, 4, 13, 11, 36, 291000), 0.0, 1, 1009.3]
['K', 17, 718, 86, 1, datetime.datetime(2013, 12, 4, 13, 11, 36, 198000), 995.688, 4, 0.0]
['K', 17, 718, 86, 2, datetime.datetime(2013, 12, 4, 13, 11, 36, 198000), 0.0, 4, 1484.0]