Question

所以我有这样的 csv 数据：

1, 2, 3, bla bla bla, 4, 5;
"1, 2, 3, ""bla, bla, bla"", 4, 5";
"6, 7, 8, ""more, bla, bla"", 9, 10";
6, 7, 8, more bla bla, 9, 10;

本质上：某一列有一个带有分隔符的字符串，它用双双引号引起来，而整行也用引号引起来。

我已经用熊猫试过了：

df = pd.read_csv("data.csv", sep=',', skipinitialspace=True, quotechar='"', doublequote=True)

但是因为有些行是用引号引起来的，所以它把它放到了第一列中：

column1                        column12    column13    column14    column15    column16
1                              2           3         bla bla bla   4           5
1,2,3,"bla, bla, bla", 4, 5    nan         nan         nan         nan         nan
6,7,8,"more, bla, bla",9,10    nan         nan         nan         nan         nan
6                              7           8         more bla bla  9           10

如何让这些引用的行进行相应的操作？

Answer 1

一种方法是在将其加载到 Pandas 之前对其进行预处理：

import csv
import pandas as pd
import io

data = []

with open('input.csv') as f_input:
    for line in f_input:
        line = line.strip('";\n').replace('""', '"')
        row = next(csv.reader(io.StringIO(line, newline=''), skipinitialspace=True))
        data.append(row)

df = pd.DataFrame(data)
print(df)

给予：

   0  1  2               3  4   5
0  1  2  3     bla bla bla  4   5
1  1  2  3   bla, bla, bla  4   5
2  6  7  8  more, bla, bla  9  10
3  6  7  8    more bla bla  9  10

或者你可以写出固定版本供以后使用：

with open('output.csv', 'w', newline='') as f_output:
    csv.writer(f_output).writerows(data)

Python读取带有双双引号元素和引号的CSV

1 个答案: