读取.tsv文件时跳过备用行

时间:2014-01-17 07:51:26

标签: java opencsv tsv supercsv

我有一个包含39列的.tsv文件 最后一列只有数据为字符串,其长度超过100,000个字符 现在发生的事情是当我试图读取文件行1有标题然后数据跟随

正在发生的事情是它在阅读第1行后,它转到第3行,然后第5行,第7行 虽然所有行都有相同的数据 按照日志我正在

lineNo=3, rowNo=2, customer=503837-100 , last but one cell length=111275
lineNo=5, rowNo=3, customer=503837-100 , last but one cell length=111275
lineNo=7, rowNo=4, customer=503837-100 , last but one cell length=111275
lineNo=9, rowNo=5, customer=503837-100 , last but one cell length=111275
lineNo=11, rowNo=6, customer=503837-100 , last but one cell length=111275
lineNo=13, rowNo=7, customer=503837-100 , last but one cell length=111275
lineNo=15, rowNo=8, customer=503837-100 , last but one cell length=111275
lineNo=17, rowNo=9, customer=503837-100 , last but one cell length=111275
lineNo=19, rowNo=10, customer=503837-100 , last but one cell length=111275

以下是我的代码:

import java.io.FileReader;
import org.supercsv.cellprocessor.Optional;
import org.supercsv.cellprocessor.constraint.NotNull;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.io.CsvBeanReader;
import org.supercsv.io.ICsvBeanReader;
import org.supercsv.prefs.CsvPreference;

public class readWithCsvBeanReader {
    public static void main(String[] args) throws Exception{
        readWithCsvBeanReader();
    }


private static void readWithCsvBeanReader() throws Exception {

    ICsvBeanReader beanReader = null;

    try {

        beanReader = new CsvBeanReader(new FileReader("C:\MAP TSV\abc.tsv"), CsvPreference.TAB_PREFERENCE);
        // the header elements are used to map the values to the bean (names must match)
        final String[] header = beanReader.getHeader(true);
        final CellProcessor[] processors = getProcessors();
        TSVReaderBrandDTO tsvReaderBrandDTO = new TSVReaderBrandDTO();

        int i = 0;
        int last = 0;

        while( (tsvReaderBrandDTO = beanReader.read(TSVReaderBrandDTO.class, header, processors)) != null ) {
            if(null == tsvReaderBrandDTO.getPage_cache()){
                last = 0;
            }
            else{
                last = tsvReaderBrandDTO.getPage_cache().length();
            }
            System.out.println(String.format("lineNo=%s, rowNo=%s, customer=%s , last but one cell length=%s", beanReader.getLineNumber(),
                beanReader.getRowNumber(), tsvReaderBrandDTO.getUnique_ID(), last));
            i++;
        }

        System.out.println("Number of rows : "+i);

    }
    finally {
        if( beanReader != null ) {
            beanReader.close();
        }
    }
}

private static CellProcessor[] getProcessors() {

    final CellProcessor[] processors = new CellProcessor[] { 
         new Optional(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new Optional()};

        return processors;
    }
}

请让我知道我哪里出错了

2 个答案:

答案 0 :(得分:0)

我查了http://supercsv.sourceforge.net/examples_reading.html。仔细查看示例CSV文件输出。难道不是你的行包含一个非转义的"(双撇号)字符,所以解析器认为数据记录跨越两条物理线?

如果您不使用双撇号字符作为引号字符,则可以更改CsvPreference - 请参阅http://supercsv.sourceforge.net/apidocs/org/supercsv/prefs/CsvPreference.html - 以便双引号不被视为引号字符:

CsvPreference MY_PREFERENCES = new CsvPreference.Builder(
    SOME_NEVER_USED_CHARACTER, ',', "\r\n").build();

当然,对于制表符分隔的CSV,请使用以下内容:

CsvPreference MY_PREFERENCES = new CsvPreference.Builder(
    SOME_NEVER_USED_CHARACTER, '\t', "\r\n").build();

请参阅CsvPreference javadoc以获取Builder的签名,并相应地修改实际值。

答案 1 :(得分:0)

如果您使用CSV解析器来解析TSV输入,那么您将度过难关。 使用适当的TSV解析器。 uniVocity-parsers附带一个TSV解析器/编写器。您也可以使用带注释的java bean将文件直接解析为类的实例。

示例:

此代码将TSV解析为行。

TsvParserSettings settings = new TsvParserSettings();

// creates a TSV parser
TsvParser parser = new TsvParser(settings);

// parses all rows in one go.
List<String[]> allRows = parser.parseAll(new FileReader(yourFile));

将BeanListProcessor解析用于java bean:

BeanListProcessor<TestBean> rowProcessor = new BeanListProcessor<TestBean>(TestBean.class);

TsvParserSettings parserSettings = new TsvParserSettings();
parserSettings.setRowProcessor(rowProcessor);

TsvParser parser = new TsvParser(parserSettings);
parser.parse(new FileReader(yourFile));

// The BeanListProcessor provides a list of objects extracted from the input.
List<TestBean> beans = rowProcessor.getBeans();

这是TestBean类的样子:     class TestBean {

// if the value parsed in the quantity column is "?" or "-", it will be replaced by null.
@NullString(nulls = { "?", "-" })
// if a value resolves to null, it will be converted to the String "0".
@Parsed(defaultNullRead = "0")
private Integer quantity;


@Trim
@LowerCase
@Parsed(index = 4)
private String comments;

// you can also explicitly give the name of a column in the file.
@Parsed(field = "amount")
private BigDecimal amount;

@Trim
@LowerCase
// values "no", "n" and "null" will be converted to false; values "yes" and "y" will be converted to true
@BooleanString(falseStrings = { "no", "n", "null" }, trueStrings = { "yes", "y" })
@Parsed
private Boolean pending;

披露:我是这个图书馆的作者。它是开源和免费的(Apache V2.0许可证)。