Question

_csv.Error: field larger than field limit (131072)不能解决我的问题。

我有一个脚本，可将CSV文件处理为Excel报告。该脚本运行良好，直到某些特定的CSV文件变得很大（当前> 12 MB）为止。

该脚本通常在Windows 7 64位上运行，因为该团队正在使用Windows客户端。 Python版本范围从3.6到3.7.2-全部为64位。所有版本都会产生错误。

我获得第一名的错误是

_csv。错误：字段大于字段限制（131072）

哪个使用搜索功能似乎很容易修复。但是当我加入

csv.field_size_limit（sys.maxsize）

这只会使情况更糟：

Traceback (most recent call last):
  File "CSV-to-Excel.py", line 123, in <module>
    report = process_csv_report(infile)
  File "CSV-to-Excel.py", line 30, in process_csv_report
    csv.field_size_limit(sys.maxsize)
OverflowError: Python int too large to convert to C long

根据我的研究，错误应该早已得到解决。

我当前的解决方法是使用Linux，使代码正常运行。但是，应该运行脚本的团队不能运行Linux，但只能在Windows上锁定。

脚本的代码是

#!c:\python37\python.exe

import csv
import sys


def process_csv_report(CSV_report_file):
    files = []
    files.append(CSV_report_file+"_low.csv")
    files.append(CSV_report_file+"_med.csv")
    files.append(CSV_report_file+"_high.csv")
    first = True
    try:
        report = []
        for f in files:
            if first == True:
                with open(f, "r", newline='', encoding='utf-8') as csvfile:
                    original = csv.reader(csvfile, delimiter=',', quotechar='"')
                    for row in original:
                        report.append(row)
                first = False
            else:
                with open(f, "r", newline='', encoding='utf-8') as csvfile:
                    original = csv.reader(csvfile, delimiter=',', quotechar='"')
                    # for the second and third file skip the header line
                    next(original, None)
                    for row in original:
                        report.append(row)
    except Exception as e:
        print("File I/O error! File: {}; Error: {}".format(f, str(e)))
        exit(1)
    return report


if __name__ == "__main__":
    report = process_csv_report(infile)

看上去很简单，我迷失了解决这个问题的方法，因为在这里我无法看到针对其他人的解决方案失败。

有人看到过最近的Python版本会发生这种情况吗？

Answer 1

您可以将sys.maxsize替换为c integer max value，即2147483647。

我知道sys.maxsize应该照顾好它，但我认为使用劣于该顶的值，例如1.000.000应该可以解决您的问题。

一种更好的方法是min(sys.maxsize, 2147483646)

_csv库是已编译的扩展，然后使用c变量。

Windows 7（64位）上的Python 3.7 64位：CSV-字段大于字段限制（131072）

1 个答案: