Question

我有一个包含带有时间戳记的连续数据的.csv文件（但是在某些情况下，可能会丢失一些行-分钟），我需要编写一些脚本来处理该文件，并将这些丢失的行与相邻行的平均值相加

当前数据示例：

<input id="bShow" type="button" value="show_table">

，我想将这些行添加到正确的位置：

_yyyy,_mm,_dd,_HH,_MM,_SS,T
2015,01,01,00,00,00,-5.0
2015,01,01,00,02,00,-5.2
2015,01,01,00,03,00,-5.3
2015,01,01,00,04,00,-5.3
2015,01,01,00,05,00,-5.3
2015,01,01,00,06,00,-5.3
2015,01,01,00,07,00,-5.3
2015,01,01,00,08,00,-5.3
2015,01,01,00,09,00,-5.3
2015,01,01,00,11,00,-5.3
2015,01,01,00,14,00,-5.3

非常感谢您的回答

Answer 1

您可以尝试使用下面的代码。它以您期望的方式运行，但方式有所不同。首先，它使用readlines函数逐行读取csv文件。这将创建一个字符串列表，每个字符串对应一个行。然后，它去除\n换行符，并将字符串拆分为另一个单个单元格值列表。之后，它将对该嵌套列表执行排序操作，并将列表写回到csv文件。

with open("data.csv") as f:
    content = f.readlines()
# removes '\n' character
content = [x.strip() for x in content]
# split each row to individual cell values
content = [x.split(',') for x in content]

# for sorting purpose, we need to remove the csv header. Before removing let's store the header to a new variable
csv_header = ','.join(content[0])
csv_header += '\n'

del content[0] # removes csv header from the list

# the values must be enclosed in single or double quotes. 
content.append(['2015','01','01','00','01','00','-5.1'])
content.append(['2015','01','01','00','10','00','-5.3'])
content.append(['2015','01','01','00','12','00','-5.3'])
content.append(['2015','01','01','00','13','00','-5.3'])

# before sorting
print(content)

# sorting using the timestamp value
content.sort(key = lambda x: x[6])

# after sorting
print(content)

# opens a new file for writing, you may use the same filename here. Then it will overwrite the original csv file. 
f = open('output.csv','w')
f.write(csv_header) # writes the csv header
for i in content:
    # converts list to string 
    row = ','.join(i)
    row += '\n' # adds a new line character
    f.write(row) # writes the string to output.csv
f.close()

data.csv文件

_yyyy,_mm,_dd,_HH,_MM,_SS,T
2015,01,01,00,00,00,-5.0
2015,01,01,00,02,00,-5.2
2015,01,01,00,03,00,-5.3
2015,01,01,00,04,00,-5.3
2015,01,01,00,05,00,-5.3
2015,01,01,00,06,00,-5.3
2015,01,01,00,07,00,-5.3
2015,01,01,00,08,00,-5.3
2015,01,01,00,09,00,-5.3
2015,01,01,00,11,00,-5.3
2015,01,01,00,14,00,-5.3

output.csv文件

_yyyy,_mm,_dd,_HH,_MM,_SS,T
2015,01,01,00,00,00,-5.0
2015,01,01,00,01,00,-5.1
2015,01,01,00,02,00,-5.2
2015,01,01,00,03,00,-5.3
2015,01,01,00,04,00,-5.3
2015,01,01,00,05,00,-5.3
2015,01,01,00,06,00,-5.3
2015,01,01,00,07,00,-5.3
2015,01,01,00,08,00,-5.3
2015,01,01,00,09,00,-5.3
2015,01,01,00,11,00,-5.3
2015,01,01,00,14,00,-5.3
2015,01,01,00,10,00,-5.3
2015,01,01,00,12,00,-5.3
2015,01,01,00,13,00,-5.3

希望这会有所帮助。

Answer 2

此代码应通过Python3运行。

#!/usr/bin/python3
import csv
from datetime import datetime, timedelta

def get_average(val1, val2):
    return (val1 + val2) / 2

def create_row(prev_date, prev_T, next_T):
    missed_date = prev_date + timedelta(minutes=1)
    row = {
            '_yyyy': missed_date.year,
            '_mm': missed_date.month,
            '_dd': missed_date.day,
            '_HH': missed_date.hour,
            '_MM': missed_date.minute,
            '_SS': missed_date._SS,
            'T': get_average(prev_T, next_T)
    }

def create_datetime(row):
    dt = datetime.datetime(row['_yyyy'], row['_mm'], row['_dd'], row['_HH'], row['_MM'], row['_SS'])

def is_minute_line_missing(prev_date, cur_date):
    if prev_date is None:
        return False
    elif cur_date - timedelta(minutes=1) != prev_date:
        return True

def complete_csv():
    with open('path/to/csv/file') as csvfile:
        reader = csv.DictReader(csvfile, delimiter=';')

        prev_date = None
        cur_date = None
        prev_T = None

        for row in reader:
            cur_date = create_datetime(row)
            if is_minute_line_missing(prev_date, cur_date):
                missed_row = create_row(prev_date, prev_T, row['T'])
                # insert this missed_row in new file or whatever
            else:
                prev_date = cur_date
                prev_T = row['T']

如果时间戳记不连续，请在文件中添加缺少的行

2 个答案: