Question

我使用大型CSV文件。我能够建立一个将文件分成小块的代码：

<table[^>]*>(?!.*<table[^>]*>)

我面临的问题是，在拆分后生成的第一个csv在其所有行的开头和结尾都有双引号。其余的CSV文件没有这个双引号问题。此外，原始文件没有任何双引号。

示例，第一个csv文件如下所示：
  ＆＃34; ABC，ghhh，123，fgfg＆＃34;
  ＆＃34; hjfhj，12312，ADFA，6765＆＃34;

这会导致一个问题，因为我必须对它们进行更多的测试，而第一个文件导致问题，而休息就好了。如果有人可以帮我修改此代码以解决我的问题，将会很有帮助。

Answer 1

快速浏览CSV模块可以回答您的问题。

https://docs.python.org/3/library/csv.html#csv.QUOTE_NONE

Answer 2

您可以使用Pandas修复输入并使逻辑更简单。

import csv
import pandas as pd

filename='big-'
for count, chunk in enumerate(pd.read_csv(filename, delimiter=",", quoting=csv.QUOTE_NONE, encoding='utf-8', iterator=True, chunksize=50000)):
    #fix the 1 and N columns to remove the doublequotes char
    chunk[chunk.columns[0]]=chunk[chunk.columns[0]].str[1:]
    chunk[chunk.columns[-1]]=chunk[chunk.columns[-1]].str[:-1]
    #change these columns datatypes if necessary/useful
    #put in the rest of your logic here (saving files etc..)
    chunk.to_csv(file_name+'{}'.format(count))

*警告我没有测试整个解决方案。所以你的里程可能会有所不同。

感谢Quote_None片段的@ code-mocker。

拆分CSV文件时，从前端和末尾删除双引号

2 个答案: