Question

我有一个非常简单的问题：使用Python从txt文件中读取不同条目的最有效方法是什么？

假设我有一个文本文件，如：

42017     360940084.621356  21.00  09/06/2015  13:08:04
42017     360941465.680841  29.00  09/06/2015  13:31:05
42017     360948446.517761  16.00  09/06/2015  15:27:26
42049     361133954.539315  31.00  11/06/2015  18:59:14
42062     361208584.222483  10.00  12/06/2015  15:43:04
42068     361256740.238150  19.00  13/06/2015  05:05:40

在C中我会这样做：

while(fscanf(file_name, "%d %lf %f %d/%d/%d %d:%d:%d", &id, &t0, &score, &day, &month, &year, &hour, &minute, &second) != EOF){...some instruction...}

在Python中做这样的事情的最佳方法是什么？为了将每个值存储到一个不同的变量中（因为我必须在整个代码中使用这些变量）。

提前致谢！

Answer 1

我觉得泥鱼回答很好，这是另一种方式（可能更轻一点）

import time
with open(file) as f:
    for line in f:
        identifier, t0, score, date, hour = line.split()

        # You can also get a time_struct from the time
        timer = time.strptime(date + hour, "%d/%m/%Y%H:%M:%S")

Answer 2

我会查找string.split（）方法

例如，您可以使用

for line in file.readlines():
    data = dict(zip(("id", "t0", "score", "date", "time"), line.split(" ")))
    instructions()

Answer 3

根据您对数据的处理方式，pandas可能需要考虑：

import pandas as pd

with open(file_name) as infile:
    df = pd.read_fwf(infile, header=None, parse_dates=[[3, 4]], 
        date_parser=lambda x: pd.to_datetime(x, format='%d/%m/%Y %H:%M:%S'))

双列表[[3, 4]]与date_parser参数一起将第三个和第四个（0索引）列读作单个数据时对象。然后，您可以使用

访问该列的各个部分

>>> df['3_4'].dt.hour
0    13
1    13
2    15
3    18
4    15
5     5
dtype: int64

（如果您不喜欢＆＃39; 3_4＆＃39;键，请使用上面的parse_dates参数，如下所示：

parse_dates={'timestamp': [3, 4]}

）

read_fwf用于读取您的数据似乎遵守的固定宽度列。或者，有read_csv，read_table和lot more等功能。

（这个答案几乎与this SO answer重复，但由于这里的问题比较一般，我将此作为另一个答案，而不是评论。）

python从文件中读取数据

3 个答案: