如何将带引号的csv字段拆分为两个字段?

时间:2016-05-19 11:18:42

标签: python csv

最终编辑:它有效!感谢大家的帮助,特别感谢Padraic帮助我,直到我开始工作。

首先,如果以前曾经问过这个问题我很道歉,我确实进行了相当广泛的搜索,但也许它的措辞方式是我没想到的。

所以我正在使用这样的csv文件:

0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5,21171,7.25,S

我必须解析这个文件,然后把它的一部分写到我用这段代码做的另一个csv:

import csv
infile = open('data/data.csv', 'r')  
incsv = csv.reader(infile, delimiter = ',')
outfile = open('data/output.csv', 'w', newline = '')
outcsv = csv.writer(outfile, delimiter = ',')

问题是字段'name'的格式为"Lastname, othernames",我需要将其拆分为两个字段:'lastname'和'othernames'。

我似乎无法找到一种方法让它忽略引号并用分隔符(',')拆分名称。这是一个列表,所以.strip()不起作用,我无法弄清楚quote_none是否不起作用,或者我是否只是没有语法。

这可能不言而喻,但我对这一切都很陌生。

编辑:我遇到了这些解决方案的错误,因此我将包含其余的代码,希望能够突出显示出错的地方。

import csv

infile = open('data/titanic.csv', 'r')
incsv = csv.reader(infile, delimiter = ',')
outfile = open('data/survivors.csv', 'w', newline = '')
outcsv = csv.writer(outfile, delimiter = ',')

dict ={}

for row in incsv:
survived, pclass, name, sex, age, sibsp, parch, ticket, fare, cabin,    embarked = row
    if survived == "1": 
        if name not in dict:
            dict[name] = name, pclass, sex, age

names = dict.keys()
sorted_names = sorted(names)

for name in sorted_names:
    (name, pclass, sex, age) = dict[name]
rowOutput = (name, pclass, sex, age)
outcsv.writerow(rowOutput)

outfile.close()    
infile.close()  

所以这解析了原来的csv,过滤了幸存的=='1',将名字添加到一个字典中(我知道我需要在分割名字字段后调整它),并按字母顺序对该字典进行排序。

编辑:这是原始文件的更多内容。很抱歉最初没有包含更多内容。

survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S
1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S

这是10行892(如果不计算标题,则为891)。

4 个答案:

答案 0 :(得分:3)

您可以在迭代时修改列表:

for row in incsv:
    row[2:2] = row[2].split(',')
    outcsv.writerow(row)

答案 1 :(得分:1)

如果数据始终在同一列中,您可以拆分:

  In [20]: s = '0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5,21171,7.25,S'

In [21]: import  csv

In [22]: row = (next(csv.reader([s])))

In [23]:row
['0', '3', 'Braund, Mr. Owen Harris', 'male', '22', '1', '0', 'A/5', '21171', '7.25', 'S']

In [24]: last,first = row[2].split(",")

In [25]: last, first.strip()
Out[25]: ('Braund', 'Mr. Owen Harris')

假设你很想要使用姓氏作为主键:

from operator import itemgetter

dct = {}

with  open('data/titanic.csv') as infile, open('data/survivors.csv', 'w', newline='') as outfile:
    incsv = csv.reader(infile)
    outcsv = csv.writer(outfile)
    for survived, pclass, name, sex, age in map(itemgetter(0,1, 2, 3, 4), incsv):
        if survived == "1":
            last, first = name.split(",")
            dct[last] = [first, pclass, sex, age]

    sorted_names = sorted(dct)
    for last_name in sorted_names:
         outcsv.writerow( [last_name] + dct[last_name])

itemgetter(0,1,2,3,4)只提取我们感兴趣的前五列,我们在for循环中解包五个值,拆分名称并使用姓氏作为键。

如果缺少名字,您可以使用 str.partition

        last, _, first = name.partition(",")
        dct[last] = first.strip(), pclass, sex, age

最终输出的格式为:

last_name, other_names, plcass, sex, age

样本行上的输出:

In [2]: cat test.csv
1,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5,21171,7.25,S
0,3,"Braund1, Mr. Owen Harris",male,22,1,0,A/5,21171,7.25,S
1,3,"Braund3, Mr. Owen2 Harris2",male,22,1,0,A/5,21171,7.25,S
0,3,"Braund2, Mr. Owen2 Harris2",male,22,1,0,A/5,21171,7.25,S
In [3]: cat survivors.csv

In [4]: paste
from operator import itemgetter
import csv
dct = {}
with open('test.csv') as infile, open('survivors.csv', 'w', newline='') as outfile:
    incsv = csv.reader(infile)
    outcsv = csv.writer(outfile)
    for survived, pclass, name, sex, age in map(itemgetter(0, 1, 2, 3, 4), incsv):
        if survived == "1":
            last, first = name.split(",")
            dct[last] = [first, pclass, sex, age]
    sorted_names = sorted(dct)
    for last_name in sorted_names:
        outcsv.writerow([last_name] + dct[last_name])

## -- End pasted text --

In [5]: cat survivors.csv
Braund,Mr. Owen Harris,3,male,22
Braund3,Mr. Owen2 Harris2,3,male,22

答案 2 :(得分:1)

您可以编写一个简单的转换函数,在将这些行传递给CSV阅读器之前对其进行修改

import csv

def transform(f):
    for line in f:
        yield line.replace('"', '')

infile = open('C:/in.csv', 'r')  
incsv = csv.reader(transform(infile), delimiter = ',')
outfile = open('C:/out.csv', 'w')
outcsv = csv.writer(outfile, delimiter = ',')

outcsv.writerows(incsv)

答案 3 :(得分:1)

不要试图破坏csv模块:你用引号括起一个字段,把它作为单个字段读取。

但是......一旦你得到它,你就可以轻松地将它拆分(引号已经在那一刻消失)并将其写为输出csv文件中的两个不同字段:

for row in in csv:
     survived, pclass, name, sex, age, sibsp, parch, ticket, fare, cabin,   embarked = row
    try:
        lastname, othernames = name.split(',', 1)
    except:
        lastname, othernames = (name, '')
    if survived == "1": 
        # ok, you can use lastname and othernames...