Question

我正在处理一个python脚本，它生成一个csv文件，从文件夹中的所有csv文件中读取不同的列。现在我生成文件并对列进行排序。生成csv的代码是：

import csv
import glob
import os, sys

dirs = glob.glob('*.csv')

namelist = list(dirs)
timestamp = ['TimeStamp']
file1 = dirs[0]

for file in namelist:
    namelist[namelist.index(file)] = file.partition("TrendLogExtended_")[2].partition("-Ext).csv")[0]

primariga=[]
primariga.extend(timestamp)
primariga.extend(namelist)

print dirs[0]
print len(dirs)
print namelist[0]

primofile = csv.reader(open(file1, 'rb'), delimiter=";", quotechar='|')
output_rows = []

for row in primofile:
    output_rows.append([row[2]])

for file in dirs:
    data = csv.reader(open(file, 'rb'), delimiter=";", quotechar='|')
    column = []
for idx,row in enumerate(data):
    output_rows[idx].append(row[15])

with open("provaoutput.tmp", 'wb') as f:
    writer = csv.writer(f, delimiter=';')
    for row in output_rows:
        writer.writerow(row)

with open("provaoutput.tmp", 'r') as data_file:
    lines = data_file.readlines()
    lines[0]= ";".join(primariga) +"\n"
    with open("finale.txt", 'w') as out_data:
        for line in lines:
            out_data.write(line)

使用此脚本，我生成一个类似于：

的CSV

TimeStamp;TH;AM;RHNoEB
2014/08/27 11:15:19.658;;;
2014/08/27 10:15:26.060;52.51;24.51;19.23
2014/08/27 10:15:56.050;52.51;24.24;19.18
2014/08/27 10:16:26.060;52.48;24.89;19.45
2014/08/27 10:16:56.045;52.37;25.16;19.83
....

我用另一个看起来像这样的脚本对这个csv进行排序：

import numpy as np

a = np.loadtxt('finale.txt', dtype=str, delimiter=';')
s = a[0].argsort() # produces the indexes which would sort the header
s = np.append(0, s[s!=0]) # put 0 at the front again, that's Timestamp
final = a[:,s]
np.savetxt('finale-2.txt', final, fmt='%s', delimiter=';')

我获得：

TimeStamp;AM;RHNoEB;TH
2014/08/27 11:15:19.658;;;
2014/08/27 10:15:26.060;24.51;19.23;52.51
2014/08/27 10:15:56.050;24.24;19.18;52.51
2014/08/27 10:16:26.060;24.89;19.45;52.48
2014/08/27 10:16:56.045;25.16;19.83;52.37
....

到目前为止一切顺利。现在我有两个问题。某些行（如第二行）只有时间戳而没有任何度量。我想删除所有这些＆＃34;空＆＃34; （我的意思是（sometimestamp ;;;;;;;;＆＃39;）。我该怎么做？

第二个问题是我想生成另一个csv计算每5或10行（X分钟）的平均值。我的意思是：

TimeStamp;AM;RHNoEB;TH
2014/08/27 10:15;24.375;19.205;52.51
2014/08/27 10:16;25.025;19.64;52.425
....

TimeStamp不是一个大问题，我可以使用我用来计算平均值的第一个度量的时间戳。你能帮帮我吗？

有关平均值的更多信息。

在每一行我都有来自不同设备的测量。我想计算每个设备的每个X度量（X行）的平均值（每列，每个设备都有自己的列）。 X可以是每10行，或类似的东西。输入csv是我用前一个脚本排序和清理的。

我的意思是：

timestamp1-1;5;4;2 
timestamp1-2;3;6;4 
timestamp2-1;4;2;1 
timestamp2-2;8;4;1

获取

timestamp1-1;4;5;3 
timestamp2-1;6;3;1

Answer 1

消除您可以使用的无效时间戳，例如：

with open("filetoclean.stats",'r') as input, open("cleanedfile.stats", "w") as output :
    for line in input:
        if not ";\n" in line:
            output.write(line)

此代码从文件中复制所有不以“;”结尾的行并将它们复制到另一个文件中。如果这个条件不足以满足您的要求，您应该考虑使用正则表达式。

- 编辑：在问题的第二部分添加答案 -

关于每X行每列的平均值，此代码应该有效：

数据= [ “timestamp1-1; 5; 4; 2”， “timestamp1-2; 3; 6; 4”， “timestamp2-1; 4; 2; 1”， “timestamp2-2; 8; 4; 1”]

def add_values(tmp,values):
    if tmp == None :
        tmp = [0.0]*len(values)
    #sum values by pair of same index
    return list(map(lambda x,y : x+y, tmp, values))

def pretty_string(aList):
    return str(aList)[1:-1].replace(",",";")

def row_average(data, packsize):
    #initialisation
    row_count = 0
    tmp = None
    result = ""

    for row in data:
        index,text_values = row.split("-")
        values = [float(value) for value in text_values.split(";")]
        tmp = add_values(tmp,values)
        print(tmp)
        row_count += 1

        if row_count == packsize:
            #the map divide all the values of the list by packsize
            result += "{0}-{1}\n".format(index,pretty_string(list(map(lambda x: x/packsize, tmp))))
            print(result)
            tmp = None
            row_count = 0

    return result

if __name__ == "__main__":
    a = row_average(data, 2)

我使用了很少的子功能来使它更具可读性。

希望它有所帮助。

Answer 2

这应该计算平均值并忽略没有测量的线

with open('your_csv', 'rb') as f:
    for i, l in enumerate(f):
        if i == 0: continue
        s_l = l.split(';')
        last_3 = s_l[-3:]
        if all(last_3):
            last_3_floats = map(float, last_3)
            avg = sum(last_3_floats )/len(last_3_floats)
            print s_l[0] + ';' + str(avg)

未经测试

删除行并使用python在csv中打印平均值

2 个答案: