使用Python重新整理.csv数据

时间:2018-07-19 05:11:51

标签: python python-3.x csv format

我有一个csv文件,其中包含每天将数据分成单独列的数据。

'Time', 'Sun 01', 'Mon 02', 'Tue 03', 'Wed 04', 'Thu 05', 'Fri 06', 'Sat 07', 'Sun 08', 'Mon 09', 'Tue 10', 'Wed 11', 'Thu 12', 'Fri 13', 'Sat 14', 'Sun 15', 'Mon 16', 'Tue 17', 'Wed 18', 'Thu 19', 'Fri 20', 'Sat 21', 'Sun 22', 'Mon 23', 'Tue 24', 'Wed 25', 'Thu 26', 'Fri 27', 'Sat 28', 'Sun 29', 'Mon 30'
'00:00-00:05', '0.30', '0.30', '0.30', '0.30', '0.30', '0.40', '0.10', '0.20', '0.20', '0.20', '0.10', '0.20', '0.20', '0.30', '0.30', '0.10', '0.20', '0.20', '0.10', '0.10', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.20', '0.10', '0.10'
'00:05-00:10', '0.30', '0.30', '0.30', '0.30', '0.30', '0.50', '0.20', '0.10', '0.10', '0.20', '0.10', '0.30', '0.10', '0.20', '0.30', '0.10', '0.20', '0.10', '0.20', '0.20', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.10', '0.10', '0.10'
'00:10-00:15', '0.30', '0.30', '0.30', '0.30', '0.30', '0.40', '0.20', '0.20', '0.20', '0.20', '0.20', '0.30', '0.10', '0.30', '0.30', '0.20', '0.10', '0.20', '0.10', '0.10', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.20', '0.20', '0.10'
'00:15-00:20', '0.30', '0.30', '0.30', '0.30', '0.40', '0.50', '0.10', '0.10', '0.10', '0.20', '0.10', '0.30', '0.20', '0.30', '0.30', '0.10', '0.20', '0.20', '0.20', '0.20', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.10', '0.10', '0.00'
'00:20-00:25', '0.30', '0.30', '0.40', '0.40', '0.30', '0.40', '0.20', '0.20', '0.20', '0.20', '0.10', '0.30', '0.10', '0.30', '0.30', '0.10', '0.20', '0.10', '0.20', '0.10', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.20', '0.10', '0.20'

使用python,有没有办法重新排列数据,以便将每天的数据添加到具有一个长列的前几天的数据的末尾?

示例:

Date, Time, Value,
01-01-2000, 00:00, 0.01
01-01-2000, 00:00, 0.01
01-01-2000, 00:05, 0.01
01-01-2000, 00:10, 0.01
02-01-2000, 00:00, 0.01
02-01-2000, 00:05, 0.01
02-01-2000, 00:10, 0.01

我在尝试遍历数据时陷入困境。 如果将csv中的数据设置为变量,则会丢失单独的列表,并且不确定如何再次分离数据,以便可以每天将其追加到新csv的底部。 有没有一种方法可以将csv数据存储在一个变量中,该变量将为每一行维护单独的列表?

到目前为止,我有:

import csv
month_year = "01-2000"
filename = 'test.csv'

converted_data = "converted_" + filename
cols = ['Time', 'Date(dd-mm-yyyy', 'kWh']

interval_count = 0
day = 1

with open(converted_data, 'w') as csvfile:
    csvwriter = csv.writer(csvfile)

    csvwriter.writerow(cols)

    with open(filename, 'r') as csvfile:
        data = csv.reader(csvfile)
        next(data)

        for line in data:
            total_count = len(line[1:]) * 288       # 288 = amount of 5 min intervals in 24 hours

            time_full = line[0]
            time_clean = (time_full[:5])
            if day <= 9:
                date = "0{0}{1}".format(day, month_year)
            else:
                date = "{0}{1}".format(day, month_year)
            # print(line)
            row = [time_clean, date, line[day]]
            print(row)
            csvwriter.writerow(row)
            interval_count += 1
            if interval_count % 288 == 0:
                day += 1
                interval_count = 0

任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:1)

我已经在我的代码中添加了注释。本质上,您使用zip()来获取数据的列式视图。然后应用一些逻辑,以使每一天的内容更加生动。然后,将所有数据写入输出文件:


# First we create your data file to your specification:

with open("d.txt","w") as w:
    w.write("""'Time', 'Sun 01', 'Mon 02', 'Tue 03', 'Wed 04', 'Thu 05', 'Fri 06', 'Sat 07', 'Sun 08', 'Mon 09', 'Tue 10', 'Wed 11', 'Thu 12', 'Fri 13', 'Sat 14', 'Sun 15', 'Mon 16', 'Tue 17', 'Wed 18', 'Thu 19', 'Fri 20', 'Sat 21', 'Sun 22', 'Mon 23', 'Tue 24', 'Wed 25', 'Thu 26', 'Fri 27', 'Sat 28', 'Sun 29', 'Mon 30'
'00:00-00:05', '0.30', '0.30', '0.30', '0.30', '0.30', '0.40', '0.10', '0.20', '0.20', '0.20', '0.10', '0.20', '0.20', '0.30', '0.30', '0.10', '0.20', '0.20', '0.10', '0.10', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.20', '0.10', '0.10'
'00:05-00:10', '0.30', '0.30', '0.30', '0.30', '0.30', '0.50', '0.20', '0.10', '0.10', '0.20', '0.10', '0.30', '0.10', '0.20', '0.30', '0.10', '0.20', '0.10', '0.20', '0.20', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.10', '0.10', '0.10'
'00:10-00:15', '0.30', '0.30', '0.30', '0.30', '0.30', '0.40', '0.20', '0.20', '0.20', '0.20', '0.20', '0.30', '0.10', '0.30', '0.30', '0.20', '0.10', '0.20', '0.10', '0.10', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.20', '0.20', '0.10'
'00:15-00:20', '0.30', '0.30', '0.30', '0.30', '0.40', '0.50', '0.10', '0.10', '0.10', '0.20', '0.10', '0.30', '0.20', '0.30', '0.30', '0.10', '0.20', '0.20', '0.20', '0.20', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.10', '0.10', '0.00'
'00:20-00:25', '0.30', '0.30', '0.40', '0.40', '0.30', '0.40', '0.20', '0.20', '0.20', '0.20', '0.10', '0.30', '0.10', '0.30', '0.30', '0.10', '0.20', '0.10', '0.20', '0.10', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.20', '0.10', '0.20'
""")

然后我们将其读回并进行处理:

import csv

# we append each row into a list - we get lists of rows:
with open("d.txt","r",newline='') as r:
    reader = csv.reader(r, delimiter = ',', quotechar = "'", skipinitialspace = True)
    data = []
    for row in reader:
        data.append(row)

# we transpose these lists of rows into lists of columns, we seperate out the 
# time-row, we will need it multiple times - once for each day
time, *dataX = list(map(list,zip(*data)))
print(time)   # see (shortened) debug 
print(dataX)  # output below

# now we open a new csv, same settings then your old one:
with open("mod.txt","w",newline='') as w:
    writer = csv.writer(w,delimiter=',', quotechar="'",skipinitialspace=True, quoting=csv.QUOTE_ALL)
    # write a custom header
    writer.writerow(["date","time","value"])
    # for each row of data we need to create a new output row
    for r in dataX:
        # that we construct using the times we split out earlier
        for i,t in enumerate(time):
            if i==0: # this is just the text "'Time'" - dont need it
                continue
            # here we take the day ('Sun 01', 'Mon 02', ...), add the time t and index into the data
            writer.writerow([r[0],t,r[i]])


# read created file back in and print line-wise:
with open("mod.txt","r") as r:
    for row in r:
        print(row, end="")

输出:

# the time we split off
['Time', '00:00-00:05', '00:05-00:10', '00:10-00:15', '00:15-00:20', '00:20-00:25']

# the rest of the data
[['Sun 01', '0.30', '0.30', '0.30', '0.30', '0.30'], 
 ['Mon 02', '0.30', '0.30', '0.30', '0.30', '0.30'], 
 ['Tue 03', '0.30', '0.30', '0.30', '0.30', '0.40'], 
        **snipp - you get the gist of it **
 ['Sun 29', '0.10', '0.10', '0.20', '0.10', '0.10'], 
 ['Mon 30', '0.10', '0.10', '0.10', '0.00', '0.20']]

# the created file 
'date','time','value'
'Sun 01','00:00-00:05','0.30'
'Sun 01','00:05-00:10','0.30'
'Sun 01','00:10-00:15','0.30'
'Sun 01','00:15-00:20','0.30'
'Sun 01','00:20-00:25','0.30'
'Mon 02','00:00-00:05','0.30'
'Mon 02','00:05-00:10','0.30'
'Mon 02','00:10-00:15','0.30'
'Mon 02','00:15-00:20','0.30'
'Mon 02','00:20-00:25','0.30'
'Tue 03','00:00-00:05','0.30'
'Tue 03','00:05-00:10','0.30'
'Tue 03','00:10-00:15','0.30'
'Tue 03','00:15-00:20','0.30'
'Tue 03','00:20-00:25','0.40'
 **snipp - you get the gist of it **
'Sun 29','00:00-00:05','0.10'
'Sun 29','00:05-00:10','0.10'
'Sun 29','00:10-00:15','0.20'
'Sun 29','00:15-00:20','0.10'
'Sun 29','00:20-00:25','0.10'
'Mon 30','00:00-00:05','0.10'
'Mon 30','00:05-00:10','0.10'
'Mon 30','00:10-00:15','0.10'
'Mon 30','00:15-00:20','0.00'
'Mon 30','00:20-00:25','0.20'

如果(每行)打印而不是r[0]之类的r[0].split()[-1] + "-01-2000",您将更加接近所需的输出。如果您希望使用其他 quoting-options ,请阅读Quoting constants

HTH