Question

我试图从IMDB TSV：

$hutter             Battle of the Sexes (2017)  (as $hutter Boy)  [Bobby Riggs Fan]  <10>
                    NVTION: The Star Nation Rapumentary (2016)  (as $hutter Boy)  [Himself]  <1>
                    Secret in Their Eyes (2015)  (uncredited)  [2002 Dodger Fan]
                    Steve Jobs (2015)  (uncredited)  [1988 Opera House Patron]
                    Straight Outta Compton (2015)  (uncredited)  [Club Patron/Dopeman]



$lim, Bee Moe       Fatherhood 101 (2013)  (as Brandon Moore)  [Himself - President, Passages]
                    For Thy Love 2 (2009)  [Thug 1]
                    Night of the Jackals (2009) (V)  [Trooth]
                    "Idle Talk" (2013)  (as Brandon Moore)  [Himself]
                    "Idle Times" (2012) {(#1.1)}  (as Brandon Moore)  [Detective Ryan Turner]

正如你可以看到一些行以制表符开头而有些行则没有。我想要一张地图，其中以演员的名字为键，电影列表为值。在演员的名字之间是一个或多个标签，直到电影列出。

我的代码：

        while ((line = reader.readLine()) != null) {

            Matcher matcher = headerPattern.matcher(line);
            boolean headerMatchFound = matcher.matches();

            if (headerMatchFound) {
                Logger.getLogger(ActorListParser.class.getName()).log(Level.INFO, "Header for actor list found");

                String newline;

                reader.readLine();

                while ((newline = reader.readLine()) != null) {
                    String[] fullLine = null;

                    String actor;
                    String title;

                    Pattern startsWithTab = Pattern.compile("^\t.*");
                    Matcher tab = startsWithTab.matcher(newline);
                    boolean tabStartMatcher = tab.matches();

                    if (!tabStartMatcher) {

                        fullLine = newline.split("\t.*");

                   System.out.println("Actor: " + fullLine[0] +
                          "Movie: " + fullLine[1]);

                   }//this line will have code to match lines that start with tabs.
                }
          } 

        }

我做到这一点的方式只能在我得到arrayoutofbounds异常之前的几行。如果它们有一个或多个标签，我如何解析这些行并将它们分成最多2个字符串？

Answer 1

解析与引用和转义有关的制表符/逗号分隔数据文件有一些细微之处。

为了节省大量工作，挫折和头痛，您应该考虑使用现有的CSV解析库之一，例如OpenCSV或Apache Commons CSV。

作为答案而不是评论发布，因为OP没有说明重新发明轮子的理由，并且有些任务确实已经解决了＃34;一劳永逸。

解析制表符分隔文件

1 个答案: