Question

如何在不解析第一行的情况下解析我的CSV文件？

这个课程有效，但我不想解析我的CSV标题。

import groovy.sql.Sql

class CSVParserService {

    boolean transactional = false

    def sql = Sql.newInstance("jdbc:mysql://localhost/RProject", "xxx", "xxx", "com.mysql.jdbc.Driver")

    def CSVList = sql.dataSet("ModuleSet")

    def CSVParser(String filepath, boolean header) {

      def parse = new File(filepath)

      // split and populate GeneInfo
      parse.splitEachLine(',') {fields ->

        CSVList.add(
                Module : fields[0],
                Function : fields[1],
                Systematic_Name : fields[2],
                Common_Name : fields[3],
              )

         return CSVList
      }

    }
}

我改变了我的班级，所以现在我有了：

import groovy.sql.Sql

class CSVParserService {

    boolean transactional = false

    def sql = Sql.newInstance("jdbc:mysql://localhost/RProject", "xxx", "xxx", "com.mysql.jdbc.Driver")

    def CSVList = sql.dataSet("ModuleSet")

    def CSVParser(String filepath, boolean header) {

    def parse = new File(filepath).readLines()[1..-1]

    parse.each {line ->

      // split and populate GeneInfo
      line.splitEachLine(',') {fields ->

        CSVList.add(
                Module : fields[0],
                Function : fields[1],
                Systematic_Name : fields[2],
                Common_Name : fields[3],
              )

         return CSVList
      }
     }
    }
}

工作正常，直到我的CSV中的这一部分：
“Homo sapiens白细胞介素4受体（IL4R），转录变体1，mRNA。”

当我的解析器得到这个部分时，他切入3（应该是1）：
- 智人白细胞介素4受体（IL4R）
- 成绩单变式1
- mRNA。

我该如何解决？谢谢你的帮助。

- 新评论 - 这是我的CSV行的副本（第2行）：
“M6.6”，NA，“ILMN_1652185”，NA，NA，“IL4RA; CD124”，NA，“NM_000418.2”，“16”，“16p12.1a”，“Homo sapiens白细胞介素4受体（IL4R），转录变体1，mRNA。“，3566，...

正如你所看到的，我的问题符合“智人白细胞介素4受体（IL4R），转录变体1，mRNA”。 ;我不想在“和”之间剪切文字。我的解析器应该只用引号（而不是引号之间的逗号）分割'，'。例如，我有：“part1”，“part2”，“part3”，我只想剪切part1，part2，part3，如果我的part2中有逗号，我不想删除这些逗号。

总而言之，我只想在引用元素中忽略逗号。

Answer 1

您可以使用：

将第一行以外的文件读入List

List<String> allLinesExceptHeader = new File(filepath).readLines()[1..-1]

然后可以使用类似于上面显示的代码解析文件的每一行（allLinesExceptHeader的元素）

allLinesExceptHeader.each {line ->    
    // Code to parse each line goes here
}

Answer 2

好的，我有我的修复！

这里是代码：

import groovy.sql.Sql

class CSVParserService {

    boolean transactional = false

    def sql = Sql.newInstance("jdbc:mysql://localhost/RProject", "xxx", "xxx", "com.mysql.jdbc.Driver")

    def CSVList = sql.dataSet("ModuleSet")

    def CSVParser(String filepath, boolean header) {

    def parse = new File(filepath).readLines()[1..-1]

    def token = ',(?=([^\"]*\"[^\"]*\")*[^\"]*$)'

    parse.each {line ->

      // split and populate GeneInfo
      line.splitEachLine(token) {fields ->

        CSVList.add(
                Module : fields[0],
                Function : fields[1],
                Systematic_Name : fields[2],
                Common_Name : fields[3],
              )

         return CSVList
      }
     }
    }
}

有关详细信息，请参阅此帖子： Java: splitting a comma-separated string but ignoring commas in quotes

Groovy csv解析器并导出到数据库

2 个答案: