使用univocity解析器加载大型CSV文件时,EOF过早。

时间:2018-11-07 01:49:58

标签: csv univocity

Caused by: java.io.IOException: Premature EOF
    at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:565)
    at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)
    at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:696)
    at java.io.FilterInputStream.read(FilterInputStream.java:133)
    at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3393)
    at org.glassfish.jersey.client.internal.HttpUrlConnector$2.read(HttpUrlConnector.java:228)
    at org.glassfish.jersey.message.internal.EntityInputStream.read(EntityInputStream.java:102)
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at java.io.BufferedReader.read1(BufferedReader.java:210)
    at java.io.BufferedReader.read(BufferedReader.java:286)
    at com.univocity.parsers.common.input.concurrent.CharBucket.fill(CharBucket.java:70)
    at com.univocity.parsers.common.input.concurrent.ConcurrentCharLoader.readBucket(ConcurrentCharLoader.java:71)
    at com.univocity.parsers.common.input.concurrent.ConcurrentCharLoader.run(ConcurrentCharLoader.java:88)
    at java.lang.Thread.run(Thread.java:748)

解析器配置如下:

com.univocity.parsers.common.TextParsingException: java.io.IOException - Premature EOF
Parser Configuration: CsvParserSettings:
        Auto configuration enabled=true
        Autodetect column delimiter=true
        Autodetect quotes=true
        Column reordering enabled=true
        Delimiters for detection=[]
        Empty value=null
        Escape unquoted values=false
        Header extraction enabled=null
        Headers=null
        Ignore leading whitespaces=true
        Ignore leading whitespaces in quotes=false
        Ignore trailing whitespaces=true
        Ignore trailing whitespaces in quotes=false
        Input buffer size=8388608
        Input reading on separate thread=true
        Keep escape sequences=false
        Keep quotes=false
        Length of content displayed on error=-1
        Line separator detection enabled=true
        Maximum number of characters per column=4096
        Maximum number of columns=512
        Normalize escaped line separators=true
        Null value=null
        Number of records to read=all
        Processor=none
        Restricting data in exceptions=false
        RowProcessor error handler=null
        Selected fields=none
        Skip bits as whitespace=true
        Skip empty lines=true
        Unescaped quote handling=nullFormat configuration:
        CsvFormat:
                Comment character=#
                Field delimiter=,
                Line separator (normalized)=\n
                Line separator sequence=\n
                Quote character="
                Quote escape character="
                Quote escape escape character=null

抛出以下错误时的内部状态:

引发错误时的内部状态:第1171815行,第4列,第1171815行,charIndex = 134217728,标头= [Counter,FirstName,LastName,IdNumber,StartDate,Salary,SecurityCleared,ManagerFName,ManagerLName,ManagerId,ProfileId ,DateEvaluated,FriendFname,FriendLname,Friend],已解析的内容= 201         在com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:369)         在com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:595)

1 个答案:

答案 0 :(得分:0)

此处是库的作者。

服务器似乎发送了无效的分块数据,或过早地终止了连接。这似乎不是解析器的错。

您是否可以使用apache-commons-io FileUtils.copyURLToFile之类的文件将文件保存在本地?

如果可以的话,也请避免为解析器提供BufferedReader,因为它具有自己的内部缓冲区。