如何在同一列上统一csv文件的数据格式

时间:2015-10-29 10:06:02

标签: python csv pandas

我的csv来自bloomberg,其格式如下:

Time Interval,Close,Net Chg,Open,High,Low,Tick Count,Volume
05SEP2012,,,,,,,
09:15 - 09:30,97.722,0,98.34,98.34,97.722,2,37155
09:30 - 09:45,97.899,0.177,98.164,98.164,97.281,102,101725
09:45 - 10:00,97.722,-0.177,97.899,97.899,97.193,32,39874
06SEP2012,,,,,,,
09:15 - 09:30,98.076,0.883,98.076,98.076,98.076,1,22429
09:30 - 09:45,97.193,-0.883,97.634,97.987,97.104,72,67741
09:45 - 10:00,96.928,-0.265,97.193,97.193,96.751,80,148963...

如果我想统一格式,以便[日期XX / XX / 201X +时间XX:XX-XX:XX]成为匹配的关键,它可能看起来像:

Date,Time Interval,Close,Net Chg,Open,High,Low,Tick Count,Volume
05SEP2012,,,,,,,,
05SEP2012,09:15 - 09:30,97.722,0,98.34,98.34,97.722,2,37155
05SEP2012,09:30 - 09:45,97.899,0.177,98.164,98.164,97.281,102,101725
05SEP2012,09:45 - 10:00,97.722,-0.177,97.899,97.899,97.193,32,39874
06SEP2012,,,,,,,,
06SEP2012,09:15 - 09:30,98.076,0.883,98.076,98.076,98.076,1,22429
06SEP2012,09:30 - 09:45,97.193,-0.883,97.634,97.987,97.104,72,67741
06SEP2012,09:45 - 10:00,96.928,-0.265,97.193,97.193,96.751,80,148963...

愿任何人告诉我,我应该写什么代码?我是一个非常新的编程和尝试编写关于学校项目的配对交易的python程序。这个article的内容是我的主要参考,当输入数据时,它无法输入我们收集的csv数据。

2 个答案:

答案 0 :(得分:0)

 for python 3

 import csv
    with open('data.csv', 'r', newline='') as f,  open('data_out.csv', 'w', newline='') as f_out:
        reader = csv.reader(f,quotechar='"')
        # read headers
        headers = next(reader)
        # insert new column name
        headers.insert(0,"Date")

        w = csv.writer(f_out, delimiter=',' )
        # write headers
        w.writerow(headers)

        for line in f:
            if ',,,' in line:
                newcolumn = line
                newcolumn = line.strip()
                newcolumn = newcolumn.replace(',','')
                f_out.write(line)
            else:
                line = newcolumn + ',' + line.strip()
                line = line.split(',')
                w.writerow(line)

for python 2.7

import csv
with open('data.csv', 'rb') as f,  open('data_out.csv', 'wb') as f_out:
    reader = csv.reader(f,quotechar='"')
    # read headers
    headers = next(reader)
    # insert new column name
    headers.insert(0,"Date")

    w = csv.writer(f_out, delimiter=',' )
    # write headers
    w.writerow(headers)

    for line in f:
        if ',,,' in line:
            newcolumn = line
            newcolumn = line.strip()
            newcolumn = newcolumn.replace(',','')
            f_out.write(line)
        else:
            line = newcolumn + ',' + line.strip()
            line = line.split(',')
            w.writerow(line)

    Date,Time Interval,Close,Net Chg,Open,High,Low,Tick Count,Volume
    05SEP2012,,,,,,,
    05SEP2012,09:15 - 09:30,97.722,0,98.34,98.34,97.722,2,37155
    05SEP2012,09:30 - 09:45,97.899,0.177,98.164,98.164,97.281,102,101725
    05SEP2012,09:45 - 10:00,97.722,-0.177,97.899,97.899,97.193,32,39874
    06SEP2012,,,,,,,
    06SEP2012,09:15 - 09:30,98.076,0.883,98.076,98.076,98.076,1,22429
    06SEP2012,09:30 - 09:45,97.193,-0.883,97.634,97.987,97.104,72,67741
    06SEP2012,09:45 - 10:00,96.928,-0.265,97.193,97.193,96.751,80,148963

答案 1 :(得分:0)

# First open your file:
csv_file = open(path_to_file, 'r')

# Initialize list to hold the rows
rows = []

# For each line in your file, split values into a list and add to the rows list
for line in csv_file:
    rows.append(line.split(','))

现在每个行元素都是一个结构相似的列表。你可以比较类似的"细胞" - 比如说,第一和第二行的第一列:

rows [1] [0] vs rows [2] [0],请记住列表索引是从零开始的。

希望这会让你顺利上路,

欢呼声