Question

如果以前有人问过这个问题，我很抱歉，但我对Python很新。我有一个包含类似于以下数据记录的文件;

k; 0; 710; 85; 2; 2013：12：04：13：11：36.291; 0.0000; 1; 1009.3000; 0;
k; 0; 710; 85; 3; 2013：12：04：13：11：36.291; 0.0000; 1; 1009.3000; 0;
k; 17; 718; 86; 1; 2013：12：04：13：11：36.198; 995.6880; 4; 0.0000; 0; 0.0000; 280; 0.0000; 576; 0.0000; 904;
k; 17; 718; 86; 2; 2013：12：04：13：11：36.198; 0.0000; 4; 1484.0000; 0; 1484.0000; 280; 1484.0000; 576; 1481.6000; 904;

记录的长度各不相同，但我只对每条记录中的前八项感兴趣。每条记录中的项目用“;”分隔。字符和不同数量的空格字符。当我读取文件时，我想将每一行分配给一个列表，但我还想在列表中定义具有正确类型的项目，例如str，int，int，int，int，datetime，float，int等。目前我使用以下代码;

def file_extract(pathfile):  
    file = open(pathfile)  
    contents = file.read()  
    # remove spaces and split data based on ';' and \n  
    data_list = [lines.replace(" ","").split(";") for lines in contents.split("\n")]  
    for line in data_list:  
        if line[0] == "K":  
            listraw=line[:9]  
            listraw[1]=int(line[1])  
            listraw[2]=int(line[2])  
            # continue setting types in the listraw[] etc. etc.

不幸的是，当我将文件内容中的每条记录读入列表时，列表中的所有项目都会自动分配给类似于以下内容的字符串值;
'K''0''710''85''2''2013：12：04：13：11：36.291'......
然后，我必须遍历列表中的每个单独项目以根据需要设置类型。是否有更优雅的方式在列表中设置各个类型？

Answer 1

您可以将数据类型放在列表中，然后使用zip将它们与字段匹配。像这样：

import datetime

# write a parser for the timepoints
def dateparser(string):
    # guessed the dateformat
    return datetime.datetime.strptime(string, '%Y:%m:%d:%H:%M:%S.%f')

# From your code `if line[0] == 'K'` I assume that 'K' is a key for the
# datatypes in the corresponding row.

# For every rowtype you define the datatypes here, where datatype
# is equivalent to a parser. Just make sure it accepts a string and returns the
# type you need.
# I guessed the types here so it works with your example.
parsers = {'K': [str,int,int,int,int,dateparser,float,int,float]}

# the example content
contents = """K; 0; 710; 85; 2; 2013:12:04:13:11:36.291; 0.0000; 1;1009.3000; 0;
K; 0; 710; 85; 3; 2013:12:04:13:11:36.291; 0.0000; 1;1009.3000; 0;
K; 17; 718; 86; 1; 2013:12:04:13:11:36.198; 995.6880; 4; 0.0000; 0; 0.0000; 280; 0.0000; 576; 0.0000; 904;
K; 17; 718; 86; 2; 2013:12:04:13:11:36.198; 0.0000; 4;1484.0000; 0;1484.0000; 280;1484.0000; 576;1481.6000; 904; """

data = []
# the right way for doing this with a file would be:
# with open(filepath, 'r') as f:
#     for line in f:
for line in contents.split('\n'):
    # skip empty lines
    if not line.strip():
        continue

    # first split then strip, feels safer this way...
    fields = [f.strip() for f in line.split(';')]

    # select the parserlist from our dict
    parser_list = parsers[fields[0]]

    # Now match the fields with their parsers, it will automatically stop
    # when there is no parser left. This means if you have 8 parsers only 8
    # fields will be evaluated and the rest is ignored.
    # Comes in handy when the lengths of your row types differ.
    # However it this also goes the other way around. If there 
    # are less fields than parsers, the last parsers will be
    # ignored. If you don't want this to happen you have to
    # make sure that len(fields) >= len(parser_list)
    data.append([parser(field) for parser, field in zip(parser_list, fields)])

for row in data:
    print(row)

打印：

['K', 0, 710, 85, 2, datetime.datetime(2013, 12, 4, 13, 11, 36, 291000), 0.0, 1, 1009.3]
['K', 0, 710, 85, 3, datetime.datetime(2013, 12, 4, 13, 11, 36, 291000), 0.0, 1, 1009.3]
['K', 17, 718, 86, 1, datetime.datetime(2013, 12, 4, 13, 11, 36, 198000), 995.688, 4, 0.0]
['K', 17, 718, 86, 2, datetime.datetime(2013, 12, 4, 13, 11, 36, 198000), 0.0, 4, 1484.0]

事先为Python列表分配不同的类型

1 个答案: