在将内容从csv文件传输到mysql时,解析csv文件中的文本中存在的逗号

时间:2013-11-04 22:52:28

标签: java mysql csv jdbc

我想将csv文件的内容传输到mysql。在我的csv文件中,有些列的文本包含逗号。

我使用下面的代码来传输内容

`

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.util.Date;

import org.apache.commons.lang.StringUtils;

import au.com.bytecode.opencsv.CSVReader;




public class CSVLoader {


    static int  count;
    private static final 
        String SQL_INSERT = "INSERT INTO ${table}(${keys}) VALUES(${values})";
    private static final String TABLE_REGEX = "\\$\\{table\\}";
    private static final String KEYS_REGEX = "\\$\\{keys\\}";
    private static final String VALUES_REGEX = "\\$\\{values\\}";

    private Connection connection;
    private char seprator;

    /**
     * Public constructor to build CSVLoader object with
     * Connection details. The connection is closed on success
     * or failure.
     * @param connection
     */
    public CSVLoader(Connection connection) {
        this.connection = connection;
        //Set default separator
        this.seprator = ',';
    }

    /**
     * Parse CSV file using OpenCSV library and load in 
     * given database table. 
     * @param csvFile Input CSV file
     * @param tableName Database table name to import data
     * @param truncateBeforeLoad Truncate the table before inserting 
     *          new records.
     * @throws Exception
     */
    public void loadCSV(String csvFile, String tableName,
            boolean truncateBeforeLoad) throws Exception {

        CSVReader csvReader = null;
        if(null == this.connection) {
            throw new Exception("Not a valid connection.");
        }
        try {

            csvReader = new CSVReader(new FileReader(csvFile), this.seprator);

        } catch (Exception e) {
            e.printStackTrace();
            throw new Exception("Error occured while executing file. "
                    + e.getMessage());
        }

        //String[] headerRow = csvReader.readNext();
        String[] headerRow = csvReader.readNext();
        count++;
        if (null == headerRow) {
            throw new FileNotFoundException(
                    "No columns defined in given CSV file." +
                    "Please check the CSV file format.");
        }

        String questionmarks = StringUtils.repeat("?,", headerRow.length);
        System.out.println(headerRow.length);
        questionmarks = (String) questionmarks.subSequence(0, questionmarks
                .length() - 1);

        String query = SQL_INSERT.replaceFirst(TABLE_REGEX, tableName);
        query = query
                .replaceFirst(KEYS_REGEX, StringUtils.join(headerRow, ","));
        query = query.replaceFirst(VALUES_REGEX, questionmarks);

        System.out.println("Query: " + query);

        String[] nextLine;
        Connection con = null;
        PreparedStatement ps = null;
        try {
            con = this.connection;
            con.setAutoCommit(false);
            ps = con.prepareStatement(query);

            if(truncateBeforeLoad) {
                //delete data from table before loading csv
                con.createStatement().execute("DELETE FROM " + tableName);
            }

            final int batchSize = 1000;
            int count = 0;
            Date date = null;
            while ((nextLine = csvReader.readNext()) != null) {

                if (null != nextLine) {
                    int index = 1;
                    for (String string : nextLine) {
                        date = DateUtil.convertToDate(string);
                        if (null != date) {
                            ps.setDate(index++, new java.sql.Date(date
                                    .getTime()));
                        } else {
                            ps.setString(index++, string);
                        }
                    }
                    System.out.println(count);
                    ps.addBatch();
                    System.out.println(count);
                }
                if (++count % batchSize == 0) {
                    System.out.println(count);
                    ps.executeBatch();
                }
            }
            ps.executeBatch(); // insert remaining records
            con.commit();
        } catch (Exception e) {
            con.rollback();
            e.printStackTrace();
            throw new Exception(
                    "Error occured while loading data from file to database."
                            + e.getMessage());
        } finally {
            if (null != ps)
                ps.close();
            if (null != con)
                con.close();

            csvReader.close();
        }
    }

    public char getSeprator() {
        return seprator;
    }

    public void setSeprator(char seprator) {
        this.seprator = seprator;
    }

}

` 执行时我收到错误"没有为参数23"指定值。 我的数据库表有22列,csv文件也有22列。所以我猜测在第一行本身有一个文本,其中有一个逗号,它无法解析它,因此它假定为23列而不是22。 任何人都可以帮助我澄清问题并为我提供解决方案。

2 个答案:

答案 0 :(得分:0)

我认为当前的问题是,在将列名插入SQL语句时,不要转义列名。你正在创建的是这种形式的陈述:

INSERT INTO sometable(key1,key2,key3) VALUES(?,?,?)

现在,如果你在标题行中有一个逗号(假设一个键是“ke,y3”),即使你的CSV库正确读取它,你也会创建这样的东西:

INSERT INTO sometable(key1,key2,ke,y3) VALUES(?,?,?)

现在,您的值数量和列数不匹配。请注意,对于其他一些字符也可能发生这种情况:也许您在一个键中有一个问号被解释为参数占位符?

解决方案:为了省去一些头痛,如果可能的话,请在键中避免使用这些字符。我不确定mysql如何正确处理它们,但如果确实如此,你需要在插入之前至少转义列名。我不确定你会如何正确安全地做到这一点(以防止SQL注入),但由于这显然是一次性工具,将列名称包装在这样的反引号中应该足够了:

INSERT INTO sometable(`key1`,`key2`,`ke,y3`) VALUES(?,?,?)

答案 1 :(得分:-1)

CSV文件中有两种类型的逗号。一种逗号分隔字段,另一种逗号是文本的一部分,始终出现在引号之间。您需要以不同于引号内的逗号来解析引号之外的逗号。您的代码似乎没有这样做。也许是这样的事情:

repeat
  c <-read next character
  if (c == '"')
    parse quoted field  // May include commas.
  else
    parse non-quoted field // Will not include commas.
  endif
until file all read.

使用不同的方法来解析引用和非引用的字段,可以很容易地正确处理这两种类型的逗号。