Question

我有时间纪元（UNIX时间）的数据文件，我试图将数据日/日分开在单独的文件中。例如：数据是90天，所以它应该吐到90个文件。我不知道如何开始。有时我知道天数，有时候我不知道如此轻松我试图找到一种更好的方法来简单地分割数据日期/日期。 Data[0] Data[1] Timeepoch[2] Timeepoch[3]。 Time_1 and Time_2是开始时间和停止时间。

数据：这些只有几行。

Data_1  Data_2  Time_1  Time_2
3436    1174    1756908 1759291
3436    3031    1756908 1759291
3436    1349    1756908 1759291
5372    937     1756913 1756983
4821    937     1756913 1756983
4376    937     1756913 1756983
2684    937     1756913 1756983
3826    896     1756961 1756971
3826    896     1756980 1756997
5372    937     1756983 1757045
4821    937     1756983 1757045
4376    937     1756983 1757045
2684    937     1756983 1757045
3826    896     1757003 1757053
4944    3715    1757009 1757491
4944    4391    1757009 1757491
2539    1431    1757014 1757337
5372    937     1757045 1757104
4821    937     1757045 1757104
4376    937     1757045 1757104
2684    937     1757045 1757104
896     606     1757053 1757064
3826    896     1757064 1757074
5045    4901    1757074 1757085
4921    4901    1757074 1757085
4901    3545    1757074 1757085
4901    3140    1757074 1757085
4901    4243    1757074 1757085
896     606     1757074 1757084

Answer 1

要从POSIX时间戳中查找UTC日期，只需将其添加到Epoch，例如：

>>> from datetime import date, timedelta
>>> date(1970, 1, 1) + timedelta(seconds=1756908)
datetime.date(1970, 1, 21)

然后创建一个映射：date -> file并使用它来分割输入文件：

#!/usr/bin/env python
import fileinput
from datetime import date, timedelta

def get_date(line, epoch=date(1970, 1, 1)):
    try:
        timestamp = int(line.split()[2]) # timestamp from 3rd column
        return epoch + timedelta(seconds=timestamp) # UTC date
    except Exception:
        return None # can't parse timestamp

daily_files = {} # date -> file
input_file = fileinput.input()
next(input_file) # skip header
for line in input_file:
    d = get_date(line)
    file = daily_files.get(d)
    if file is None: # file for the given date is not found
       file = daily_files[d] = open(str(d), 'w') # open a new one
    file.write(line)

# close all files
for f in daily_files.values():
    try:
        f.close()
    except EnvironmentError:
        pass # ignore errors

Answer 2

import itertools
import datetime

# Extract the date from the timestamp that is the third item in a line
# (Will be grouping by start timestamp)
def key(s):
    return datetime.date.fromtimestamp(int(s.split()[2]))

with open('in.txt') as in_f:
    for date, group in itertools.groupby(in_f, key=key):
        # Output to file that is named like "1970-01-01.txt"
        with open('{:%Y-%m-%d}.txt'.format(date), 'w') as out_f:
            out_f.writelines(group)

Answer 3

datetime.fromtimestamp(timestamp)

可以从时间戳和

中获取日期时间对象

datetime.fromtimestamp(timestamp).replace(second=0, minute=0, hour=0)

只能使用日期组件来获取日期时间对象。

Answer 4

下一个代码会将每一行写入名为output-YYYY-MM-DD的文件，其中YYYY-MM-DD是从Time_2列中提取的。

from datetime import date
with open('infile.txt', 'r') as f:
    for line in f: 
        fields = line.split()
        with open('output-'+date.fromtimestamp(float(fields[3])).__str__(), 'a') as outf:
            outf.write(line)

此代码效率不高。它为每一行打开一个文件。如果您可以确保输入数据按end_time排序，则可以进行改进。

明天拆分数据

4 个答案: