CSV的正则表达式超过逗号

时间:2016-03-17 10:28:07

标签: java regex csv

我不是正则表达式专家,我正在寻找一个在同一行连续逗号后返回值的正则表达式。

数据如下所示:

2016/02/19 04:25:56,User,0.9,0,0,0,0,0.0
2016/02/19 04:25:58,User,0.0,0,0,0,0,0.0
2016/02/19 04:25:58,User,0.0,0,0,0,0,0.0
2016/02/19 04:25:57,User,0.0,0,0,0,0,0.0

我只能设法在第一个命令(即日期和时间)之后将其拆分。这是我正在使用的正则表达式:

\.*User.*( ? : \d * \.) ? \d +

2 个答案:

答案 0 :(得分:0)

如果列数不变,可以使用以下正则表达式来满足您的需求。

   Pattern pattern = Pattern.compile("(\\d{4}/\\d{2}/\\d{2} \\d{2}:\\d{2}:\\d{2}),([^,]*),([\\d\\.]*),([\\d\\.]*),([\\d\\.]*),([\\d\\.]*),([\\d\\.]*),([\\d\\.]*)");
   Matcher matcher = pattern.matcher("2016/02/19 04:25:56,User,0.9,0,0,0,0,0.0");

   if(matcher.matches()){
       int groupCount = matcher.groupCount();
       for (int i = 0; i < groupCount; i++) {
           System.out.println(matcher.group(i));
       }
   }

答案 1 :(得分:0)

关于数据的存储方式,你的问题非常模糊我假设这是数据当前是一个字符串,或多个。为什么不将每个数据字符串附加到一个巨型字符串中,每个用户用分号分隔,将其分成带有ArrayList<String> dataList = new ArrayList<String>(Arrays.asList(dataStr.split("\\;"))的列表,然后将该列表的内容分成各自的子列表,在冒号上拆分? 例如:

"user1:data1:data2;user2:data1:data2;user3:data1:data2;
**split by semicolon**
= [user1:data1:data2]
  [user2:data1:data2]
  [user3:data1:data2]
**split again by colon**
= [[user1][data1][data2]
  [[user2][data1][data2]
  [[user3][data1][data2]

这样你可以通过索引引用用户,然后通过子索引引用它们包含的数据。在实践中,它看起来像:

public static void main(String[] args){

    String dataStr = "2016/02/19 04:25:56,User,0.9,0,0,0,0,0.0\n"
                  +"2016/02/19 04:25:58,User,0.0,0,0,0,0,0.0\n"
                  +"2016/02/19 04:25:58,User,0.0,0,0,0,0,0.0\n"
                  +"2016/02/19 04:25:57,User,0.0,0,0,0,0,0.0\n";

    //ArrayList containing a line of split data
    ArrayList<String> dataList = new ArrayList<String>(Arrays.asList(dataStr.split("\\n")));
    //ArrayList of each ArrayList containing a line
    ArrayList<ArrayList<String>> listOfDataLists = new ArrayList<ArrayList<String>>();


    //To the end of 
    for(int i = 0; i < dataList.size(); i++)
    {
        listOfDataLists.add(new ArrayList<String>(Arrays.asList(dataList.get(i).split(","))));
    }

    //DEBUG PRINTS---------------------------------------------------------------------
    System.out.println("------------DEBUG-------------");
    for(int i = 0; i < listOfDataLists.size(); i++){
        for(int j = 0; j < listOfDataLists.get(i).size(); j++){
            System.out.println("[" + listOfDataLists.get(i).get(j) + "]");
        }
        System.out.println("------------------------------");
    }
}

这里我将数据字符串拆分为单独的行以便于阅读,然后使用换行符正则表达式而不是我之前建议的分号,但基本上它只是同样的事情。处理:

2016/02/19 04:25:56,User,0.9,0,0,0,0,0.0;2016/02/19 04:25:58,User,0.0,0,0,0,0,0.0;2016/02/19 04:25:58,User etc....

只需要对分号进行分割(&#34; \ n&#34;),或者实际上不管您的偏好是什么。如果数据已经存储为用户列表(可能考虑到您已将它们垂直列出)那么地狱,那么您已经在那里了一半。

此当前状态的输出是:

------------DEBUG-------------
[2016/02/19 04:25:56]
[User]
[0.9]
[0]
[0]
[0]
[0]
[0.0]
------------------------------
[2016/02/19 04:25:58]
[User]
[0.0]
[0]
[0]
[0]
[0]
[0.0]
------------------------------
[2016/02/19 04:25:58]
[User]
[0.0]
[0]
[0]
[0]
[0]
[0.0]
------------------------------
[2016/02/19 04:25:57]
[User]
[0.0]
[0]
[0]
[0]
[0]
[0.0]
------------------------------

我希望这是您希望得到的答案,因为它将每个用户的数据保存在一个列表中,同时仍然保留用户的数据,因为您可以获取用户按父列表索引,然后是按子列表索引的数据。

免责声明:我没有绝地大师程序员,但是我无法保证它是最好的方式。然而,它肯定是 a 的方式。