所以我的csv文件有超过100万条记录:(https://i.imgur.com/rhIhy5u.png) 我需要对数据进行不同的排列,以使重复的“参数”本身成为列/行,例如category1,category2,category3(有20多个类别且没有重复),但是所有数据都保持它们的关系。
我尝试在python中使用“ pandas”和“ csv”,但是我是一个陌生的人,我从未与此类数据有任何关系。
import csv
with open('./data.csv', 'r') as _filehandler:
csv_file_reader = csv.reader(_filehandler)
param = [];
csv_file_reader = csv.DictReader(_filehandler)
for row in csv_file_reader:
if not row['Param'] in param:
param.append(row['Param']);
col = "";
for p in param:
col += str(p) + '; ';
print(col);
import numpy as np
np.savetxt('./SortedWexdord.csv', (parameters), delimiter=';', fmt='%s')
我试图考虑一下,但是数据也不是我的专长,有什么想法吗?
答案 0 :(得分:1)
这里应该起作用。如果您需要像这样标准化的每一行有多个值,则可以编辑第9行(从category
开始)以获取值列表,而不仅仅是row[1]
。
import csv
data = {}
with open('data.csv', 'r') as file:
reader = csv.reader(file)
next(reader) # Skip header row
for row in reader:
category, value = row[0], row[1] # Assumes category is in column 0 and target value is in column 1
if category in data:
data[category].append(value)
else:
data[category] = [value] # New entry only for each unique category
with open('output.csv', 'wb') as file: # wb is write and binary, avoids double newlines on windows
writer = csv.writer(file)
writer.writerow(['Category', 'Value'])
for category in data:
print([category] + data[category])
writer.writerow([category] + data[category]) # Make a list starting with category and then listing each value