最终编辑:它有效!感谢大家的帮助,特别感谢Padraic帮助我,直到我开始工作。
首先,如果以前曾经问过这个问题我很道歉,我确实进行了相当广泛的搜索,但也许它的措辞方式是我没想到的。
所以我正在使用这样的csv文件:
0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5,21171,7.25,S
我必须解析这个文件,然后把它的一部分写到我用这段代码做的另一个csv:
import csv
infile = open('data/data.csv', 'r')
incsv = csv.reader(infile, delimiter = ',')
outfile = open('data/output.csv', 'w', newline = '')
outcsv = csv.writer(outfile, delimiter = ',')
问题是字段'name'的格式为"Lastname, othernames"
,我需要将其拆分为两个字段:'lastname'和'othernames'。
我似乎无法找到一种方法让它忽略引号并用分隔符(',')拆分名称。这是一个列表,所以.strip()不起作用,我无法弄清楚quote_none是否不起作用,或者我是否只是没有语法。
这可能不言而喻,但我对这一切都很陌生。
编辑:我遇到了这些解决方案的错误,因此我将包含其余的代码,希望能够突出显示出错的地方。
import csv
infile = open('data/titanic.csv', 'r')
incsv = csv.reader(infile, delimiter = ',')
outfile = open('data/survivors.csv', 'w', newline = '')
outcsv = csv.writer(outfile, delimiter = ',')
dict ={}
for row in incsv:
survived, pclass, name, sex, age, sibsp, parch, ticket, fare, cabin, embarked = row
if survived == "1":
if name not in dict:
dict[name] = name, pclass, sex, age
names = dict.keys()
sorted_names = sorted(names)
for name in sorted_names:
(name, pclass, sex, age) = dict[name]
rowOutput = (name, pclass, sex, age)
outcsv.writerow(rowOutput)
outfile.close()
infile.close()
所以这解析了原来的csv,过滤了幸存的=='1',将名字添加到一个字典中(我知道我需要在分割名字字段后调整它),并按字母顺序对该字典进行排序。
编辑:这是原始文件的更多内容。很抱歉最初没有包含更多内容。
survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S
1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S
这是10行892(如果不计算标题,则为891)。
答案 0 :(得分:3)
您可以在迭代时修改列表:
for row in incsv:
row[2:2] = row[2].split(',')
outcsv.writerow(row)
答案 1 :(得分:1)
如果数据始终在同一列中,您可以拆分:
In [20]: s = '0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5,21171,7.25,S'
In [21]: import csv
In [22]: row = (next(csv.reader([s])))
In [23]:row
['0', '3', 'Braund, Mr. Owen Harris', 'male', '22', '1', '0', 'A/5', '21171', '7.25', 'S']
In [24]: last,first = row[2].split(",")
In [25]: last, first.strip()
Out[25]: ('Braund', 'Mr. Owen Harris')
假设你很想要使用姓氏作为主键:
from operator import itemgetter
dct = {}
with open('data/titanic.csv') as infile, open('data/survivors.csv', 'w', newline='') as outfile:
incsv = csv.reader(infile)
outcsv = csv.writer(outfile)
for survived, pclass, name, sex, age in map(itemgetter(0,1, 2, 3, 4), incsv):
if survived == "1":
last, first = name.split(",")
dct[last] = [first, pclass, sex, age]
sorted_names = sorted(dct)
for last_name in sorted_names:
outcsv.writerow( [last_name] + dct[last_name])
itemgetter(0,1,2,3,4)
只提取我们感兴趣的前五列,我们在for循环中解包五个值,拆分名称并使用姓氏作为键。
如果缺少名字,您可以使用 str.partition :
last, _, first = name.partition(",")
dct[last] = first.strip(), pclass, sex, age
最终输出的格式为:
last_name, other_names, plcass, sex, age
样本行上的输出:
In [2]: cat test.csv
1,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5,21171,7.25,S
0,3,"Braund1, Mr. Owen Harris",male,22,1,0,A/5,21171,7.25,S
1,3,"Braund3, Mr. Owen2 Harris2",male,22,1,0,A/5,21171,7.25,S
0,3,"Braund2, Mr. Owen2 Harris2",male,22,1,0,A/5,21171,7.25,S
In [3]: cat survivors.csv
In [4]: paste
from operator import itemgetter
import csv
dct = {}
with open('test.csv') as infile, open('survivors.csv', 'w', newline='') as outfile:
incsv = csv.reader(infile)
outcsv = csv.writer(outfile)
for survived, pclass, name, sex, age in map(itemgetter(0, 1, 2, 3, 4), incsv):
if survived == "1":
last, first = name.split(",")
dct[last] = [first, pclass, sex, age]
sorted_names = sorted(dct)
for last_name in sorted_names:
outcsv.writerow([last_name] + dct[last_name])
## -- End pasted text --
In [5]: cat survivors.csv
Braund,Mr. Owen Harris,3,male,22
Braund3,Mr. Owen2 Harris2,3,male,22
答案 2 :(得分:1)
您可以编写一个简单的转换函数,在将这些行传递给CSV阅读器之前对其进行修改
import csv
def transform(f):
for line in f:
yield line.replace('"', '')
infile = open('C:/in.csv', 'r')
incsv = csv.reader(transform(infile), delimiter = ',')
outfile = open('C:/out.csv', 'w')
outcsv = csv.writer(outfile, delimiter = ',')
outcsv.writerows(incsv)
答案 3 :(得分:1)
不要试图破坏csv模块:你用引号括起一个字段,把它作为单个字段读取。
但是......一旦你得到它,你就可以轻松地将它拆分(引号已经在那一刻消失)并将其写为输出csv文件中的两个不同字段:
for row in in csv:
survived, pclass, name, sex, age, sibsp, parch, ticket, fare, cabin, embarked = row
try:
lastname, othernames = name.split(',', 1)
except:
lastname, othernames = (name, '')
if survived == "1":
# ok, you can use lastname and othernames...