如何解析复杂的CSV文件

时间:2019-07-21 15:39:30

标签: python-3.x csv escaping

我收到了一个包含字符串和元组元素组合的CSV文件,无法找到正确解析它的方法。我缺少明显的东西吗?

csvfile

"presentation_id","presentation_name","sectionId","sectionNumber","courseId","courseIdentifier","courseName","activity_id","activity_prompt","activity_content","solution","event_timestamp","answer_id","answer","isCorrect","userid","firstname","lastname","email","role"
"26cc7957-5a6b-4bde-a996-dd823f54ece7","3-Axial Skeleton F18","937c47b0-cc66-4938-81de-1b1b58388499","001","3b5b5e49-1798-4eab-86d7-186cf59149b4","MOVESCI 230","Human Musculoskeletal Anatomy","62d059e8-9ab4-41d4-9eb8-00ba67d9fac9","A blow to which side of the knee might tear the medial collateral ligament?","{"choices":["medial","lateral"],"type":"MultipleChoice"}","{"solution":[1],"selectAll":false,"type":"MultipleChoice"}","2018-09-30 23:54:16.000","7b5048e5-7460-49f8-a64a-763b7f62d771","{"solution":[1],"type":"MultipleChoice"}","1","57ba970d-d02b-4a10-a64d-56f02336ee08","Student","One","student1@example.com","Student"
"26cc7957-5a6b-4bde-a996-dd823f54ece7","3-Axial Skeleton F18","937c47b0-cc66-4938-81de-1b1b58388499","001","3b5b5e49-1798-4eab-86d7-186cf59149b4","MOVESCI 230","Human Musculoskeletal Anatomy","f82cb32b-45ce-4d3a-aa74-b3fa1a1038a2","What is the name of this movement?","{"choices":["right rotation","left rotation","right lateral rotation","left lateral rotation"],"type":"MultipleChoice"}","{"solution":[1],"selectAll":false,"type":"MultipleChoice"}","2018-09-30 23:20:33.000","d6cce4d9-37ae-409e-afc5-54ad79f86226","{"solution":[3],"type":"MultipleChoice"}","0","921d1b9b-f550-4289-89f1-2a805b27eeb3","Student","Two","student2@example.com","Student"

第一行是标题,第二行开始数据

with open(filepathcsv) as csvfile:
    readCSV = csv.reader(csvfile)
    for row in readCSV:
        numcolumns = len(row)
        print(numcolumns,": ",row)

产量:

20 :  ['presentation_id', 'presentation_name', 'sectionId', 'sectionNumber', 'courseId', 'courseIdentifier', 'courseName', 'activity_id', 'activity_prompt', 'activity_content', 'solution', 'event_timestamp', 'answer_id', 'answer', 'isCorrect', 'userid', 'firstname', 'lastname', 'email', 'role']
25 :  ['26cc7957-5a6b-4bde-a996-dd823f54ece7', '3-Axial Skeleton F18', '937c47b0-cc66-4938-81de-1b1b58388499', '001', '3b5b5e49-1798-4eab-86d7-186cf59149b4', 'MOVESCI 230', 'Human Musculoskeletal Anatomy', '62d059e8-9ab4-41d4-9eb8-00ba67d9fac9', 'A blow to which side of the knee might tear the medial collateral ligament?', '{choices":["medial"', 'lateral]', 'type:"MultipleChoice"}"', '{solution":[1]', 'selectAll:false', 'type:"MultipleChoice"}"', '2018-09-30 23:54:16.000', '7b5048e5-7460-49f8-a64a-763b7f62d771', '{solution":[1]', 'type:"MultipleChoice"}"', '1', '57ba970d-d02b-4a10-a64d-56f02336ee08', 'William', 'Muter', 'wmuter@umich.edu', 'Student']
27 :  ['26cc7957-5a6b-4bde-a996-dd823f54ece7', '3-Axial Skeleton F18', '937c47b0-cc66-4938-81de-1b1b58388499', '001', '3b5b5e49-1798-4eab-86d7-186cf59149b4', 'MOVESCI 230', 'Human Musculoskeletal Anatomy', 'f82cb32b-45ce-4d3a-aa74-b3fa1a1038a2', 'What is the name of this movement?', '{choices":["right rotation"', 'left rotation', 'right lateral rotation', 'left lateral rotation]', 'type:"MultipleChoice"}"', '{solution":[1]', 'selectAll:false', 'type:"MultipleChoice"}"', '2018-09-30 23:20:33.000', 'd6cce4d9-37ae-409e-afc5-54ad79f86226', '{solution":[3]', 'type:"MultipleChoice"}"', '0', '921d1b9b-f550-4289-89f1-2a805b27eeb3', 'Noah', 'Willett', 'willettn@umich.edu', 'Student']

csv.reader由于具有嵌入的花括号元素的复杂结构而对每一行的解析不同。

...但是我希望每行中有 20 个元素。

2 个答案:

答案 0 :(得分:0)

记录中的,而不是代码。您的代码工作正常。要解决此问题,您需要修复csv文件,因为带有json内容的字段未正确序列化。

只需将一个引号"更改为两个符号""即可。

这里是固定的csv行的示例。

"26cc7957-5a6b-4bde-a996-dd823f54ece7","3-Axial Skeleton F18","937c47b0-cc66-4938-81de-1b1b58388499","001","3b5b5e49-1798-4eab-86d7-186cf59149b4","MOVESCI 230","Human Musculoskeletal Anatomy","f82cb32b-45ce-4d3a-aa74-b3fa1a1038a2","What is the name of this movement?","{""choices"":[""right rotation"",""left rotation"",""right lateral rotation"",""left lateral rotation""],""type"":""MultipleChoice""}","{""solution"":[1],""selectAll"":false,""type"":""MultipleChoice""}","2018-09-30 23:20:33.000","d6cce4d9-37ae-409e-afc5-54ad79f86226","{""solution"":[3],""type"":""MultipleChoice""}","0","921d1b9b-f550-4289-89f1-2a805b27eeb3","Student","Two","student2@example.com","Student"

修正后的代码结果:

20 :  ['26cc7957-5a6b-4bde-a996-dd823f54ece7', '3-Axial Skeleton F18', '937c47b0-cc66-4938-81de-1b1b58388499', '001', '3b5b5e49-1798-4eab-86d7-186cf59149b4', 'MOVESCI 230', 'Human Musculoskeletal Anatomy', 'f82cb32b-45ce-4d3a-aa74-b3fa1a1038a2', 'What is the name of this movement?', '{"choices":["right rotation","left rotation","right lateral rotation","left lateral rotation"],"type":"MultipleChoice"}', '{"solution":[1],"selectAll":false,"type":"MultipleChoice"}', '2018-09-30 23:20:33.000', 'd6cce4d9-37ae-409e-afc5-54ad79f86226', '{"solution":[3],"type":"MultipleChoice"}', '0', '921d1b9b-f550-4289-89f1-2a805b27eeb3', 'Student', 'Two', 'student2@example.com', 'Student']

答案 1 :(得分:0)

谢谢大家的建议!

我也很抱歉,因为我没有包括我试图解析的原始CSV文件(此处为示例:)

“ b5ae18d3-b6dd-4d0a-84fe-7c43df472571” |“ Climate_Rapid_Change_W18.pdf” |“ 18563b1e-a467-44b3-aed7-3607a1acd712” |“ 001” |“ c86c8c8d-dca6-41cd-a010 | a83” 40气候102“ |”极端天气“ |” 278c4561-c834-4343-a770-3f544966f633“ |”哪个欧洲城市与安娜堡处于同一纬度?“ |” {“选择”:[“瑞典斯德哥尔摩”,“德国柏林“,”英国伦敦“,”法国巴黎“,”西班牙马德里“],”类型“:” MultipleChoice“}” || {“解决方案”:[4],“ selectAll”:false, “ type”:“ MultipleChoice”}“ |” 2019-01-31 22:11:08.000“ |” 81392cd3-28e9-4e2e-8a33-018104b1f4d1“ |” {“解决方案”:[3,4],“ type” :“” MultipleChoice“}” |“” 0“ |” 2db10c95-b507-4211-8244-394361148b22“ |”学生“ |”一个“ |” student1@umich.edu“ |”学生“ “ ee73fdaf-a926-4899-b0f7-9b942f1b44ad” |“ 6-肘部,腕部,手W19” |“ 48539109-529e-4359-83b9-2ae81be0532c” |“ 001” |“ 3b5b5e49-1798-4eab-86d7-186cf59149b4” |“ MOVESCI 230” |“人体肌肉骨骼解剖学” |“ fcd7c673-d944-48c3-8a09-f458e03f8c44” |“此运动的名称是什么?” |“ {”选择”:[“第一指骨关节”,“第一“近端指间关节”,“第一远端指间关节”,“第一指间关节”],“类型”:“ MultipleChoice”}“ || {{解决方案”:[3],“ selectAll”:false,“类型”:“ MultipleChoice“}” |“ 2019-01-31 22:07:32.000” |“ 9016f36c-41f5-4e14-84a9-78eea682c802” |“ {”解决方案“:[3],”类型“:” MultipleChoice“}” || “ 1” |“ 7184708d-4dc7-42e0-b1ea-4aca51f00fcd” |“学生” |“两个” |“ student2@umich.edu” |“学生”

您正确地认为问题出在CSV文件的形式。

  1. 我将readCSV = csv.reader(csvfile)更改为readCSV = csv.reader(csvfile,delimiter =“ |”,quotechar ='|')
  2. 然后我拿到了结果列表,并从每个元素中删除了多余的引号。

程序的其余部分现在可以正常运行。