我目前有一个脚本,我想用它来组合csv数据文件。 例如,我有一个名为process.csv和file.csv的文件,但当我尝试将一个文件附加到名为'all_files.csv'的新文件中时 它会将其附加到正确的列,但不会从文件顶部附加。
目前会发生什么:
process/sec
08/03/16 11:19 0
08/03/16 11:34 0.1
08/03/16 11:49 0
08/03/16 12:03 0
08/03/16 12:13 0
08/03/16 12:23 0
file/sec
0
43.3
0
0
0
0
0
我想要的是什么:
process/sec file/sec
08/03/16 11:19 0 0
08/03/16 11:34 0.1 43.3
08/03/16 11:49 0 0
08/03/16 12:03 0 0
08/03/16 12:13 0 0
08/03/16 12:23 0 0
这是我的代码(注意我删除了与我用于per_second
值的算法相关的所有多余代码,并在此示例中使用静态值):
def all_data(data_name,input_file_name,idx):
#Create file if first set of data
if data_name == 'first_set_of_data':
all_per_second_file = open("all_data.csv", 'wb')
#Append to file for all other data
else:
all_per_second_file = open("all_data.csv", 'a')
row_position=''
#For loop with index number to position rows after one another
#So not to rewrite new data to the same columns in all_data.csv
for number in range(0,idx):
row_position=row_position+','
with open(input_file_name, 'rb') as csvfile:
# get number of columns
for line in csvfile.readlines():
array = line.split(',')
first_item = array[0]
num_columns = len(array)
csvfile.seek(0)
reader = csv.reader(csvfile, delimiter=',')
#Columns to include Date and desired data
included_cols = [0, 3]
count =0
#Test value for example purposes
per_second=12
for row in reader:
#Create header
if count==1:
all_per_second_file.write(row_position+','+event_name+"\n")
#Intialise date column with first set of data
#first entry rate must be 0
if count ==2:
if event_name == 'first_set_of_data':
all_per_second_file.write(row_position+row[0]+",0\n")
else:
all_per_second_file.write(row_position+",0\n")
#If data after the first row =0 value should reset so data/sec should be 0, not a minus number
if count>2 and row[3]=='0':
if event_name == 'first_set_of_data':
all_per_second_file.write(row_position+row[0]+",0\n")
else:
all_per_second_file.write(row_position+",0\n")
#Otherwise calculate rate
elif count >=3:
if event_name == 'first_set_of_data':
all_per_second_file.write(row_position+row[0]+","+str("%.1f" % per_second)+"\n")
else:
all_per_second_file.write(row_position+","+str("%.1f" % per_second)+"\n")
count = count+1
all_per_second_file.close()
代码更新:
我已将脚本更改为以下似乎正常工作:
def all_data(input_file_name):
a = pd.read_csv(per_second_address+input_file_name[0])
b = pd.read_csv(per_second_address+input_file_name[1])
c = pd.read_csv(per_second_address+input_file_name[2])
d = pd.read_csv(per_second_address+input_file_name[3])
b = b.dropna(axis=1)
c = c.dropna(axis=1)
d = d.dropna(axis=1)
merged = a.merge(b, on='Date')
merged = merged.merge(c, on='Date')
merged = merged.merge(d, on='Date')
merged.to_csv(per_second_address+"all_event_per_second.csv", index=False)
答案 0 :(得分:1)
CSV文件读/写操作是基于行的。
请使用python提供的基本模块检查以下代码:
process.csv包含:
time,process/sec
8/3/2016 11:19,0
8/3/2016 11:34,0
8/3/2016 11:49,1
8/3/2016 12:03,1
8/3/2016 12:13,0
8/3/2016 12:23,0
files.csv包含:
time,files/sec
8/3/2016 11:19,0
8/3/2016 11:34,2
8/3/2016 11:49,3
8/3/2016 12:03,4
8/3/2016 12:13,1
8/3/2016 12:23,0
Python代码将创建“combine.csv”:
import csv
#Read both files
with open('process.csv', 'rb') as a:
reader = csv.reader(a,delimiter = ",")
process_csv = list(reader)
with open('files.csv', 'rb') as b:
reader = csv.reader(b,delimiter = ",")
data_csv = list(reader)
#Write into combine.csv
if len(process_csv) == len(data_csv):
with open('combine.csv', 'ab') as f:
writer = csv.writer(f,delimiter = ",")
for i in range(0,len(process_csv)):
temp_list = []
temp_list.extend(process_csv[i])
temp_list.append(data_csv[i][1])
writer.writerow(temp_list)
combine.csv有:
time,process/sec,files/sec
8/3/2016 11:19,0,0
8/3/2016 11:34,0,2
8/3/2016 11:49,1,3
8/3/2016 12:03,1,4
8/3/2016 12:13,0,1
8/3/2016 12:23,0,0
带有pandas模块的代码。
import pandas as pd
a = pd.read_csv("process.csv")
b = pd.read_csv("files.csv")
b = b.dropna(axis=1)
merged = a.merge(b, on='time')
merged.to_csv("combine2.csv", index=False)
有关pandas模块的更多信息,click here !!!