我正在尝试从CSV文件(A)中读取数据,提取数据并将其写入不同的CSV文件(B)。在新文件B中,我想要有两行。第一行应包含所有预定义变量,第2行应填充属于第1行中特定变量的所有值。
我希望有人能告诉我实现这一目标的最佳方法。 (我添加了我在本文末尾使用的.csv文件)
(A)Python代码
jQuery("document").ready(function() { jQuery("#txtuname").keyup(function(){ if(jQuery("#txtuname").val().length<6){
jQuery("label[for='txtuname']").text("user name is too short");
}
if(jQuery("#txtuname").val().length>=6){
var txtuname=jQuery("#txtuname").val();
jQuery("label[for='txtuname']").text("");
}
});
jQuery("#submitRegistration").click(function(){
if(typeof txtuname =='defined'){
alert("defined");
}
if(typeof txtuname =='undefined'){
alert("undefined");
}
});
});
这是上述代码将生成的输出的一部分:
import re
import csv
#Call for the export file
data = open('C:/Exports/Export 3.csv')
#Make a list with the predefined variables
definition = ["record_id", "abbreviation", "study_id", "step_count",
"distance", "ambulation_time", "velocity", "cadence", "norm_velocity",
"step_time_differential", "step_length_differential",
"cycle_time_differential", "step_time", "step_length", "step_extremity",
"cycle_time", "stride_length", "hh_base_support", "swing_time",
"stance_time", "single_support_time", "double_support_time", "toe_in_out"]
my_data = {}
#Show data for each row without whitespace
for line in data:
line = line.rstrip()
#print(line)
values = re.findall("-?[0-9].+", line)
print(values)
正如您在输出代码中看到的,有些行包含两个值,如:['2,988; 6,32']这些值需要变为1值,通过计算两个值的平均值然后再将它们写入一个csv文件。
(B)期望的输出
[]
['3;']
['292,34;']
['1,67;']
['175,1;']
['107,8;']
[]
['0,004;']
['1,051;']
['0,008;']
[]
[]
['0,558;0,554']
['96,746;97,797']
[]
['1,116;1,108']
['192,159;197,122']
['2,988;6,32']
['0,466;0,466']
['0,65;0,642']
['0,466;0,466']
['0,184;0,176']
['41,8;42,1']
['58,2;57,9']
['41,8;42,1']
['16,5;15,9']
['-1,1;4']
如果您愿意,可以使用导出文件进行播放,您可以在此处下载: CSV export file
答案 0 :(得分:0)
您应该使用csv
库打开文件,semi-colon
分开,然后将第一列与定义中的项目进行比较。这几乎就是这样:
import csv
from collections import defaultdict
data = defaultdict(str)
#Make a list with the predefined variables
definition = ["record_id", "abbreviation", "study_id", "step_count",
"distance", "ambulation_time", "velocity", "cadence", "norm_velocity",
"step_time_differential", "step_length_differential",
"cycle_time_differential", "step_time", "step_length", "step_extremity",
"cycle_time", "stride_length", "hh_base_support", "swing_time",
"stance_time", "single_support_time", "double_support_time", "toe_in_out"]
with open('C:/Exports/Export 3.csv', 'r') as f,
open('C:/Exports/result.csv', 'w') as outfile:
reader = csv.reader(f, delimiter=';')
next(reader, None) # skip the headers
writer = csv.DictWriter(outfile, fieldnames=definition, lineterminator='\n')
writer.writeheader()
for row in reader:
for item in definition:
h = item.replace('_','')
r0 = row[0].lower().replace(' ','')
if h in r0:
print(h, r0)
data[item] = row[1]
data['record_id'] = 1 # record id does not exist in input file: Export 3.csv
writer.writerow(data)
要获得项目的平均值,您可以使用:
try:
avg = (float(row[1].replace(',', '.')) + float(row[2].replace(',', '.')))/2
except ValueError:
avg = 0 # for cases with empty strings or commas
答案 1 :(得分:0)
这几乎是完美的!好像有一些小问题。 在result.csv中,我缺少以下变量的值:
step_time
step_length
cycle_time
stride_length
hh_base_support
swing_time
stance_time
single_supp_time
double_supp_time
toe_in_out
我使用这部分代码来检查结果:
print(h, r0, row[1], row[2])
给了我以下信息:
stepcount stepcount 3
distance distance 292,34
ambulationtime ambulationtime 1,67
velocity velocity 175,1
cadence cadence 107,8
velocity normalizedvelocity ,
normalizedvelocity normalizedvelocity ,
steptimedifferential steptimedifferential 0,004
steptime steptimedifferential 0,004
steplengthdifferential steplengthdifferential 1,051
steplength steplengthdifferential 1,051
cycletimedifferential cycletimedifferential 0,008
cycletime cycletimedifferential 0,008
steptime steptime(sec) 0,558 0,554
steplength steplength(cm) 96,746 97,797
stepextremity stepextremity(ratio) , ,
cycletime cycletime(sec) 1,116 1,108
stridelength stridelength(cm) 192,159 197,122
hhbasesupport hhbasesupport(cm) 2,988 6,32
swingtime swingtime(sec) 0,466 0,466
stancetime stancetime(sec) 0,65 0,642
velocity stridevelocity 172,185 177,908
steptime steptimestddev , 0,006
stridelength stridelengthstddev , ,
swingtime swingtimestddev , ,
stancetime stancetimestddev , ,
velocity stridevelocitystddev , ,
singlesupptime singlesupptimestddev , ,
doublesupptime doublesupptimestddev , ,
从上面的输出中,您可以看到名称与多个字符串匹配(如速度)和一些根本不匹配的问题(如toe_in_out)我不知道如何解决这个问题。
此外,我试图计算平均值,只要有两个值,但这给了我错误:ValueError:无法将字符串转换为浮点数。我认为这是导致逗号的原因。我尝试在for循环中应用以下代码来计算平均值:
float(row[1]+float(row[2])) / 2