你如何解析一个困难的.txt文件?

时间:2014-02-25 23:28:32

标签: java mysql sql jdbc

我对java很新,并且一直试图读取一个非常困难的.txt文件并将其输入我的MySQL数据库。

对我来说,该文件有一些非常奇怪的分界规则。划界似乎都是逗号,但其他部分没有任何意义。这里有几个例子:

" "," "," "," "," "

" ",,,,,,," "

" ",0.00," "

" ",," ",," ",," "

我所知道的是,所有包含字母的字段都是正常的,"text",格式。

所有仅包含数字的列都将采用以下格式:,0.00,,但第一列遵循正常格式"123456789",

然后,任何没有数据的内容都会在,,," ",

之间切换

我已经能够使用java.sql.Statement正确读取程序,但是我需要它来使用java.sql.PreparedStatement

我只能选择几个列才能使用它,但我需要这个才能使用100多列,而某些字段包含逗号,例如"Some Company, LLC"

这是我目前的代码,但我不知道下一步该怎么做。

import java.io.BufferedReader;
import java.io.FileReader;
import java.sql.*;


public class AccountTest {

  public static void main(String[] args) throws Exception {


        //Declare DB settings
    String dbName = "jdbc:mysql://localhost:3306/local";
    String userName = "root";
    String password = "";
    String fileName = "file.txt";
    String psQuery = "insert into accounttest"
                     + "(account,account_name,address_1,address_2,address_3) values"
                     + "(?,?,?,?,?)";
    Connection connect = null;
    PreparedStatement statement = null;
    String account = null;
    String accountName = null;
    String address1 = null;
    String address2 =null;
    String address3 = null;


        //Load JDBC Driver
    try {
        Class.forName("com.mysql.jdbc.Driver");
    }
    catch (ClassNotFoundException e) {
        System.out.println("JDBC driver not found.");
        e.printStackTrace();
        return;
    }


        //Attempt connection
    try {
    connect = DriverManager.getConnection(dbName,userName,password);
    }
    catch (SQLException e) {
        System.out.println("E1: Connection Failed.");
        e.printStackTrace();
        return;         
    }


        //Verify connection
    if (connect != null) {
        System.out.println("Connection successful.");
    }   
    else {
        System.out.println("E2: Connection Failed.");
    }


      BufferedReader bReader = new BufferedReader(new FileReader(fileName));
        String line;

        //import file into mysql DB
    try {

        //Looping the read block until all lines in the file are read.
    while ((line = bReader.readLine()) != null) {

            //Splitting the content of comma delimited file
        String data[] = line.split("\",\"");

            //Renaming array items for ease of use
        account = data[0];
        accountName = data[1];
        address1 = data[2];
        address2 = data[3];
        address3 = data[4];

            // removing double quotes so they do not get put into the db
        account = account.replaceAll("\"", "");
        accountName = accountName.replaceAll("\"", "");
        address1 = address1.replaceAll("\"", "");
        address2 = address2.replaceAll("\"", "");
        address3 = address3.replaceAll("\"", "");

            //putting data into database
        statement = connect.prepareStatement(psQuery);
        statement.setString(1, account);
        statement.setString(2, accountName);
        statement.setString(3, address1);
        statement.setString(4, address2);
        statement.setString(5, address3);
        statement.executeUpdate();
    }
    }
    catch (Exception e) {
        e.printStackTrace();
        statement = null;
    }
    finally {
        bReader.close();
    }
}   
}

对不起,如果格式不正确,我还在学习,并且在被慌乱几天后试图想出这个,我没有打扰让它看起来不错。

我的问题是这样的混乱文件可能会出现这种情况吗?如果是这样,我该怎么做才有可能呢?另外,我对准备好的语句并不完全熟悉,我是否必须声明每一列或者是否有更简单的方法?

提前感谢您的帮助。

编辑:为了澄清我需要的是我需要将一个txt文件上传到MySQL数据库,我需要一种方法来读取和拆分(除非有更好的方法)数据基于",",,,,,0.00,,并且仍然在字段Some Company, LLC中保留逗号的字段。我需要使用100多列来执行此操作,文件从3000行到6000行不等。需要将此作为准备好的声明。我不确定这是否可行,但我感谢任何人可能对此事提出任何意见。

EDIT2:由于rpc1,我能够弄清楚如何整理凌乱的文件。而不是String data[] = line.split("\",\"");我使用String data[] = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");我仍然必须写出每个变量以将其链接到data[],然后为每列写出每个statement.setString并写下{ {1}}对于每一列,但我使它工作,我找不到另一种方法来使用预准备语句。谢谢你的帮助!

2 个答案:

答案 0 :(得分:0)

你可以循环 例如:

    String psQuery = "insert into accounttest"
                         + "(account,account_name,address_1,address_2,address_3,..,adrress_n) values"
                         + "(?,?,?,?,?,?,..,?)";  //you have to put m=n+2 values

.....

     //you can change separator 
            String data[] = line.replace("\",\"",";").replace("\"","").split(";");

              for(int i=0;i<m;i++)
              { 
                  if(i<data.length) //if index smaller then array siz
                      statement.setString(i+1, data[i]);
                  else
                      statement.setString(i+1, ""); //put null
              }
              statement.executeUpdate();

P.S。如果您的csv文件大使用批量插入(addBatch()) 并使用Pattern来分割字符串

Pattern p = Pattern.compile(";",""); 
p.split(st);

修改 试试这个分割功能

private static Pattern pSplit = Pattern.compile("[^,\"']+|\"([^\"]*)\"|'([^']*)'"); //set pattern as global var
private static Pattern pReplace = Pattern.compile("\"");
public static Object[] split(String st)
{
   List<String> list = new ArrayList<String>();
   Matcher m = pSplit.matcher(st);
   while (m.find())
   list.add( pReplace.matcher(m.group(0)).replaceAll("")); // Add .replace("\"", "") to remove surrounding quotes.
   return list.toArray();
}
例如,

输入字符串:st="\"1212\",\"LL C ,DDD \",\"CA, SPRINGFIELD\",232.11,3232.00"; 拆分5项数组:

1212
LL C ,DDD
CA, SPRINGFIELD
232.11
3232.00

<强> EDIT2

this example solves all your problems (even empty values)


private static Pattern pSplit = Pattern.compile(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
public static String[] split2(String st)
{
    String[] tokens = pSplit.split(st);       
    return tokens;
}

答案 1 :(得分:0)

我能够通过这一点代码弄清楚我遇到的两个问题。再次感谢您的帮助!

for (String line = bReader.readLine(); line != null; line = bReader.readLine()) {   

          //Splitting the content of comma delimited file
    String data[] = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");

         //Iterating through the file and updating the table.
    statement = connect.prepareStatement(psQuery);
    for (int i =0; i < data.length;i++) {
        temp =  data[i];
        temp = temp.replaceAll("\"", "");
        statement.setString(i+1, temp);
    }
    statement.executeUpdate();
}