我的csv来自bloomberg,其格式如下:
Time Interval,Close,Net Chg,Open,High,Low,Tick Count,Volume
05SEP2012,,,,,,,
09:15 - 09:30,97.722,0,98.34,98.34,97.722,2,37155
09:30 - 09:45,97.899,0.177,98.164,98.164,97.281,102,101725
09:45 - 10:00,97.722,-0.177,97.899,97.899,97.193,32,39874
06SEP2012,,,,,,,
09:15 - 09:30,98.076,0.883,98.076,98.076,98.076,1,22429
09:30 - 09:45,97.193,-0.883,97.634,97.987,97.104,72,67741
09:45 - 10:00,96.928,-0.265,97.193,97.193,96.751,80,148963...
如果我想统一格式,以便[日期XX / XX / 201X +时间XX:XX-XX:XX]成为匹配的关键,它可能看起来像:
Date,Time Interval,Close,Net Chg,Open,High,Low,Tick Count,Volume
05SEP2012,,,,,,,,
05SEP2012,09:15 - 09:30,97.722,0,98.34,98.34,97.722,2,37155
05SEP2012,09:30 - 09:45,97.899,0.177,98.164,98.164,97.281,102,101725
05SEP2012,09:45 - 10:00,97.722,-0.177,97.899,97.899,97.193,32,39874
06SEP2012,,,,,,,,
06SEP2012,09:15 - 09:30,98.076,0.883,98.076,98.076,98.076,1,22429
06SEP2012,09:30 - 09:45,97.193,-0.883,97.634,97.987,97.104,72,67741
06SEP2012,09:45 - 10:00,96.928,-0.265,97.193,97.193,96.751,80,148963...
愿任何人告诉我,我应该写什么代码?我是一个非常新的编程和尝试编写关于学校项目的配对交易的python程序。这个article的内容是我的主要参考,当输入数据时,它无法输入我们收集的csv数据。
答案 0 :(得分:0)
for python 3
import csv
with open('data.csv', 'r', newline='') as f, open('data_out.csv', 'w', newline='') as f_out:
reader = csv.reader(f,quotechar='"')
# read headers
headers = next(reader)
# insert new column name
headers.insert(0,"Date")
w = csv.writer(f_out, delimiter=',' )
# write headers
w.writerow(headers)
for line in f:
if ',,,' in line:
newcolumn = line
newcolumn = line.strip()
newcolumn = newcolumn.replace(',','')
f_out.write(line)
else:
line = newcolumn + ',' + line.strip()
line = line.split(',')
w.writerow(line)
for python 2.7
import csv
with open('data.csv', 'rb') as f, open('data_out.csv', 'wb') as f_out:
reader = csv.reader(f,quotechar='"')
# read headers
headers = next(reader)
# insert new column name
headers.insert(0,"Date")
w = csv.writer(f_out, delimiter=',' )
# write headers
w.writerow(headers)
for line in f:
if ',,,' in line:
newcolumn = line
newcolumn = line.strip()
newcolumn = newcolumn.replace(',','')
f_out.write(line)
else:
line = newcolumn + ',' + line.strip()
line = line.split(',')
w.writerow(line)
Date,Time Interval,Close,Net Chg,Open,High,Low,Tick Count,Volume
05SEP2012,,,,,,,
05SEP2012,09:15 - 09:30,97.722,0,98.34,98.34,97.722,2,37155
05SEP2012,09:30 - 09:45,97.899,0.177,98.164,98.164,97.281,102,101725
05SEP2012,09:45 - 10:00,97.722,-0.177,97.899,97.899,97.193,32,39874
06SEP2012,,,,,,,
06SEP2012,09:15 - 09:30,98.076,0.883,98.076,98.076,98.076,1,22429
06SEP2012,09:30 - 09:45,97.193,-0.883,97.634,97.987,97.104,72,67741
06SEP2012,09:45 - 10:00,96.928,-0.265,97.193,97.193,96.751,80,148963
答案 1 :(得分:0)
# First open your file:
csv_file = open(path_to_file, 'r')
# Initialize list to hold the rows
rows = []
# For each line in your file, split values into a list and add to the rows list
for line in csv_file:
rows.append(line.split(','))
现在每个行元素都是一个结构相似的列表。你可以比较类似的"细胞" - 比如说,第一和第二行的第一列:
rows [1] [0] vs rows [2] [0],请记住列表索引是从零开始的。
希望这会让你顺利上路,
欢呼声