Duke CSVReader ArrayIndexOutOfBoundsException

时间:2016-01-07 23:39:55

标签: java

我使用Duke进行记录链接,在基本测试中,我从CSVReader获取此异常java.lang.ArrayIndexOutOfBoundsException:1000。

这是我的Java类:

Configuration config = ConfigLoader.load("resources/dukeConfiguration.xml");
    Processor proc = new Processor(config);
    proc.addMatchListener(new PrintMatchListener(true, true, true, false,
                                                 config.getProperties(),
                                                 true));
    proc.link();
    proc.close();

,这个是配置文件:

<duke>

<schema>
    <threshold>0.7</threshold>

    <property type="id">
        <name>ID</name>
    </property>

    <property>
        <name>TITLE</name>
        <comparator>no.priv.garshol.duke.comparators.Levenshtein</comparator>
        <low>0.09</low>
        <high>0.93</high>
    </property>
    <property>
        <name>ARTIST</name>
        <comparator>no.priv.garshol.duke.comparators.Levenshtein</comparator>
        <low>0.04</low>
        <high>0.73</high>
    </property>
</schema>

<group>
    <jdbc>
        <param name="driver-class" value="com.mysql.jdbc.Driver" />
        <param name="connection-string" value="jdbc:mysql://localhost:3306/digitalmusic" />
        <param name="user-name" value="root" />
        <param name="password" value="root" />
        <param name="query" value="select * from inventory" />

        <column name="idsong" property="ID" />
        <column name="title" property="TITLE" />
        <column name="artist" property="ARTIST" />
    </jdbc>
</group>

<group>
    <csv>
        <param name="input-file" value="/home/mongo.csv" />
        <param name="header-line" value="false" />

        <column name="1" property="ID" />
        <column name="2" property="TITLE" />
        <column name="3" property="ARTIST" />
    </csv>
</group>

</duke>

有人知道问题出在哪里?

堆栈跟踪:

Records: 0

Records: 40000

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1000
    at no.priv.garshol.duke.utils.CSVReader.next(CSVReader.java:70)
    at no.priv.garshol.duke.datasources.CSVDataSource$CSVRecordIterator.findNextRecord(CSVDataSource.java:170)
    at no.priv.garshol.duke.datasources.CSVDataSource$CSVRecordIterator.next(CSVDataSource.java:198)
    at no.priv.garshol.duke.datasources.CSVDataSource$CSVRecordIterator.next(CSVDataSource.java:111)
    at no.priv.garshol.duke.Processor.linkRecords(Processor.java:362)
    at no.priv.garshol.duke.Processor.link(Processor.java:319)
    at no.priv.garshol.duke.Processor.link(Processor.java:298)
    at no.priv.garshol.duke.Processor.link(Processor.java:285)
    at duke.DukeCollecting.main(DukeCollecting.java:20)

1 个答案:

答案 0 :(得分:1)

好的,这是你的问题。

根据latest source posted @ GitHub,当您实例化新的CSVReader时,会发生这种情况:

public CSVReader(Reader in, int buflen, String file) throws IOException {
    this.buf = new char[buflen];
    this.pos = 0;
    this.len = in.read(buf, 0, buf.length);
    this.tmp = new String[1000];
    this.in = in;
    this.separator = ','; // default
    this.file = file;

}

根据你的堆栈跟踪,错误发生在这个块中:

if (escaped_quote)
    tmp[colno++] = unescape(new String(buf, prev + 1, pos - prev - 1));
  else
    tmp[colno++] = new String(buf, prev + 1, pos - prev - 1);

问题是,CSVReader colno比之前分配的1000数组容量更大,因此生成java.lang.ArrayIndexOutOfBoundsException

这些是你的选择恕我直言:

  • 选项1:获取源代码(分支项目),增加tmp缓冲区,直到程序正常运行并重新编译;或

  • 选项2:检查GitHub项目页面,查看是否存在任何有关此问题的未解决问题(或只打开一个问题),并确定您的信息是否有任何格式错误可能导致array overflow

  • 的文件

我推荐选项2 除非你赶时间。

祝你好运!