如何在python中拆分.csv数据?

时间:2015-05-31 13:51:14

标签: python csv

当我使用此方法拆分csv文件时:

with open(fname) as f:    
for line in f:
    a = line.strip().split()

我得到一个预期的输出:

['Chicago', 'White', 'Sox,"Valentin,', 'Jose","5,000,000",Outfielder,,,,']
['Detroit', 'Tigers,"Bernero,', 'Adam","314,000",Pitcher,,,,']

依旧......

如何将这些数据分成正确的部分(团队,球员,薪水,职位)?

数据集(在xls中)在这里:

American League Baseball Salaries (2003)            

Team                 Player          Salary     Position

New York Yankees    Acevedo, Juan   9,00,000    Pitcher
New York Yankees    Anderson, Jason 3,00,000    Pitcher
New York Yankees    Clemens, Roger  1,01,00,000 Pitcher
New York Yankees    Contreras, Jose 55,00,000   Pitcher

3 个答案:

答案 0 :(得分:0)

您可以使用git checkout master git branch -D new-root 函数获取文件的列,并使用zip模块读取csv文件:

csv

对于大文件,使用import csv with open('file_.csv','rb') as f : csvreader=csv.reader(f,delimiter=' ') print zip(*csvreader)

itertools.izip

import csv from itertools import izip with open('file_.csv','rb') as f : csvreader=csv.reader(f,delimiter=' ') print list(izip(*csvreader)) 返回生成器时,如果要循环它,则不需要izip(用于打印内容)

另请注意,您需要使用我使用list的正确分隔符,您可以使用正确的分隔符更改它!

您也可以将结果放在字典中:

space

结果:

import csv
from itertools import izip
with open('file_.csv','rb') as f :
    csvreader=csv.reader(f,delimiter='\t')
    keys=next(csvreader)
    a=izip(*csvreader)
    d=dict(zip(keys,a))

print d
print d['Salary']

答案 1 :(得分:0)

split使用空格作为默认分隔符。如果要使用其他字符串,请将其作为要分割的段传递。在这种情况下,要用昏迷分开:

allNumbers<quantity

答案 2 :(得分:0)

格式化您的csv如下

Team,Player,Salary,Position
"New York Yankees","Acevedo, Juan","9,00,000","Pitcher"
"New York Yankees","Anderson, Jason","3,00,000","Pitcher"
"New York Yankees","Clemens, Roger","1,01,00,000","Pitcher"
"New York Yankees","Contreras, Jose","55,00,000","Pitcher"

然后使用以下python代码获取适合进一步处理的词典列表中的值

import csv
f=open('file.csv')
datareader = csv.reader(f, delimiter=',', quotechar='"')
headers = datareader.next()

datalist=[]    
for row in datareader:
    data={}
    for i in range(4):
        data[headers[i]] = row[i]
    datalist.append(data)

for data in datalist:
    print(data)