我有一个包含带有时间戳记的连续数据的.csv文件(但是在某些情况下,可能会丢失一些行-分钟),我需要编写一些脚本来处理该文件,并将这些丢失的行与相邻行的平均值相加
当前数据示例:
<input id="bShow" type="button" value="show_table">
,我想将这些行添加到正确的位置:
_yyyy,_mm,_dd,_HH,_MM,_SS,T
2015,01,01,00,00,00,-5.0
2015,01,01,00,02,00,-5.2
2015,01,01,00,03,00,-5.3
2015,01,01,00,04,00,-5.3
2015,01,01,00,05,00,-5.3
2015,01,01,00,06,00,-5.3
2015,01,01,00,07,00,-5.3
2015,01,01,00,08,00,-5.3
2015,01,01,00,09,00,-5.3
2015,01,01,00,11,00,-5.3
2015,01,01,00,14,00,-5.3
非常感谢您的回答
答案 0 :(得分:0)
您可以尝试使用下面的代码。它以您期望的方式运行,但方式有所不同。首先,它使用readlines函数逐行读取csv文件。这将创建一个字符串列表,每个字符串对应一个行。然后,它去除\n
换行符,并将字符串拆分为另一个单个单元格值列表。之后,它将对该嵌套列表执行排序操作,并将列表写回到csv文件。
with open("data.csv") as f:
content = f.readlines()
# removes '\n' character
content = [x.strip() for x in content]
# split each row to individual cell values
content = [x.split(',') for x in content]
# for sorting purpose, we need to remove the csv header. Before removing let's store the header to a new variable
csv_header = ','.join(content[0])
csv_header += '\n'
del content[0] # removes csv header from the list
# the values must be enclosed in single or double quotes.
content.append(['2015','01','01','00','01','00','-5.1'])
content.append(['2015','01','01','00','10','00','-5.3'])
content.append(['2015','01','01','00','12','00','-5.3'])
content.append(['2015','01','01','00','13','00','-5.3'])
# before sorting
print(content)
# sorting using the timestamp value
content.sort(key = lambda x: x[6])
# after sorting
print(content)
# opens a new file for writing, you may use the same filename here. Then it will overwrite the original csv file.
f = open('output.csv','w')
f.write(csv_header) # writes the csv header
for i in content:
# converts list to string
row = ','.join(i)
row += '\n' # adds a new line character
f.write(row) # writes the string to output.csv
f.close()
data.csv文件
_yyyy,_mm,_dd,_HH,_MM,_SS,T
2015,01,01,00,00,00,-5.0
2015,01,01,00,02,00,-5.2
2015,01,01,00,03,00,-5.3
2015,01,01,00,04,00,-5.3
2015,01,01,00,05,00,-5.3
2015,01,01,00,06,00,-5.3
2015,01,01,00,07,00,-5.3
2015,01,01,00,08,00,-5.3
2015,01,01,00,09,00,-5.3
2015,01,01,00,11,00,-5.3
2015,01,01,00,14,00,-5.3
output.csv文件
_yyyy,_mm,_dd,_HH,_MM,_SS,T
2015,01,01,00,00,00,-5.0
2015,01,01,00,01,00,-5.1
2015,01,01,00,02,00,-5.2
2015,01,01,00,03,00,-5.3
2015,01,01,00,04,00,-5.3
2015,01,01,00,05,00,-5.3
2015,01,01,00,06,00,-5.3
2015,01,01,00,07,00,-5.3
2015,01,01,00,08,00,-5.3
2015,01,01,00,09,00,-5.3
2015,01,01,00,11,00,-5.3
2015,01,01,00,14,00,-5.3
2015,01,01,00,10,00,-5.3
2015,01,01,00,12,00,-5.3
2015,01,01,00,13,00,-5.3
希望这会有所帮助。
答案 1 :(得分:0)
此代码应通过Python3运行。
#!/usr/bin/python3
import csv
from datetime import datetime, timedelta
def get_average(val1, val2):
return (val1 + val2) / 2
def create_row(prev_date, prev_T, next_T):
missed_date = prev_date + timedelta(minutes=1)
row = {
'_yyyy': missed_date.year,
'_mm': missed_date.month,
'_dd': missed_date.day,
'_HH': missed_date.hour,
'_MM': missed_date.minute,
'_SS': missed_date._SS,
'T': get_average(prev_T, next_T)
}
def create_datetime(row):
dt = datetime.datetime(row['_yyyy'], row['_mm'], row['_dd'], row['_HH'], row['_MM'], row['_SS'])
def is_minute_line_missing(prev_date, cur_date):
if prev_date is None:
return False
elif cur_date - timedelta(minutes=1) != prev_date:
return True
def complete_csv():
with open('path/to/csv/file') as csvfile:
reader = csv.DictReader(csvfile, delimiter=';')
prev_date = None
cur_date = None
prev_T = None
for row in reader:
cur_date = create_datetime(row)
if is_minute_line_missing(prev_date, cur_date):
missed_row = create_row(prev_date, prev_T, row['T'])
# insert this missed_row in new file or whatever
else:
prev_date = cur_date
prev_T = row['T']