我有一个csv文件,如下所示:
Date Name Wage
5/1/19 Joe $100
5/1/19 Sam $120
5/1/19 Kate $30
5/2/19 Joe $120
5/2/19 Sam $134
5/2/19 Kate $56
5/3/19 Joe $89
5/3/19 Sam $90
5/3/19 Kate $231
我想将其重组为如下形式:
Date Joe Sam Kate
5/1/19 $100 $120 $30
5/2/19 $120 $134 $56
5/3/19 $89 $90 $231
我不确定该如何处理。 这是我开始写的东西:
import csv
with open ('myfile.csv', 'rb') as filein, open ('restructured.csv', 'wb') as fileout:
rows = list(csv.DictReader(filein, skipinitialspace=True))
names = NOT SURE HOW TO GET THIS
fieldnames = ['Date'] + ['{}'.format(i) for i in names]
csvout = csv.DictWriter(fileout, fieldnames=fieldnames, extrasaction='ignore', restval='NA')
csvout.writeheader()
for row in rows:
row['{}'.format(row['Name'].strip())] = row['Wage']
csvout.writerow(row)
答案 0 :(得分:2)
仅使用pandas
库:
import pandas as pd
df = pd.read_csv("test.csv", sep="\s+")
p_table = pd.pivot_table(df, values='Wage', columns=['Name'], index='Date',
aggfunc=lambda x:x)
p_table = p_table.reset_index()
p_table.columns.name = None
print(p_table)
输出:
Date Joe Kate Sam
0 5/1/19 $100 $30 $120
1 5/2/19 $120 $56 $134
2 5/3/19 $89 $231 $90
参考链接:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html
答案 1 :(得分:2)
可以使用csv模块来完成。这是Python 3的方法:
import csv
import collections
with open ('myfile.csv', 'r') as filein, open ('restructured.csv', 'w', newline='') as fileout:
data = collections.defaultdict(dict)
names = set()
for row in csv.DictReader(filein, skipinitialspace=True):
data[row['Date']][row['Name']] = row['Wage']
names.add(row['Name'])
csvout = csv.DictWriter(fileout, fieldnames = ['Date'] + list(names))
csvout.writeheader()
for dat in sorted(data.keys()):
row = data[dat]
row['Date'] = dat
csvout.writerow(row)
生成的csv应该如下所示:
Date,Kate,Joe,Sam
5/1/19,$30,$100,$120
5/2/19,$56,$120,$134
5/3/19,$231,$89,$90
与Python 2相同,除了第一行应为:
with open ('myfile.csv', 'rb') as filein, open ('restructured.csv', 'wb') as fileout:
答案 2 :(得分:1)
您要执行的操作也称为从长格式转换为宽格式。使用pandas
,您可以通过
import pandas as pd
df = pd.read_csv("myfile.csv", sep = ',')
# Restructure the dataframe
tdf = df.pivot(index = 'Date', columns = 'Name', values = 'Wage')
tdf.to_csv("restructured.csv", sep = ',')
print(tdf)
Name Joe Kate Sam
Date
5/1/19 $100 $30 $120
5/2/19 $120 $56 $134
5/3/19 $89 $231 $90
答案 3 :(得分:0)
这应该使您走上正确的轨道
data.csv
5/1/19,Joe,$100
5/1/19,Sam,$120
5/1/19,Kate,$30
5/2/19,Joe,$120
5/2/19,Sam,$134
5/2/19,Kate,$56
5/3/19,Joe,$89
5/3/19,Sam,$90
5/3/19,Kate,$231
data = {}
people = set()
with open('data.csv', 'r') as f:
for line in f.read().splitlines():
values = line.split(',')
if values[0] not in data:
data[values[0]] = {}
data[values[0]][values[1]] = values[2]
people.add(values[1])
print('Date,' + ','.join([per for per in people]))
for date in data:
print(f"{date},{','.join([data[date][per] for per in people])}"
输出:
Date,Sam,Kate,Joe
5/1/19,$120,$30,$100
5/2/19,$134,$56,$120
5/3/19,$90,$231,$89