我试图从IMDB TSV:
$hutter Battle of the Sexes (2017) (as $hutter Boy) [Bobby Riggs Fan] <10>
NVTION: The Star Nation Rapumentary (2016) (as $hutter Boy) [Himself] <1>
Secret in Their Eyes (2015) (uncredited) [2002 Dodger Fan]
Steve Jobs (2015) (uncredited) [1988 Opera House Patron]
Straight Outta Compton (2015) (uncredited) [Club Patron/Dopeman]
$lim, Bee Moe Fatherhood 101 (2013) (as Brandon Moore) [Himself - President, Passages]
For Thy Love 2 (2009) [Thug 1]
Night of the Jackals (2009) (V) [Trooth]
"Idle Talk" (2013) (as Brandon Moore) [Himself]
"Idle Times" (2012) {(#1.1)} (as Brandon Moore) [Detective Ryan Turner]
正如你可以看到一些行以制表符开头而有些行则没有。我想要一张地图,其中以演员的名字为键,电影列表为值。在演员的名字之间是一个或多个标签,直到电影列出。
我的代码:
while ((line = reader.readLine()) != null) {
Matcher matcher = headerPattern.matcher(line);
boolean headerMatchFound = matcher.matches();
if (headerMatchFound) {
Logger.getLogger(ActorListParser.class.getName()).log(Level.INFO, "Header for actor list found");
String newline;
reader.readLine();
while ((newline = reader.readLine()) != null) {
String[] fullLine = null;
String actor;
String title;
Pattern startsWithTab = Pattern.compile("^\t.*");
Matcher tab = startsWithTab.matcher(newline);
boolean tabStartMatcher = tab.matches();
if (!tabStartMatcher) {
fullLine = newline.split("\t.*");
System.out.println("Actor: " + fullLine[0] +
"Movie: " + fullLine[1]);
}//this line will have code to match lines that start with tabs.
}
}
}
我做到这一点的方式只能在我得到arrayoutofbounds
异常之前的几行。如果它们有一个或多个标签,我如何解析这些行并将它们分成最多2个字符串?
答案 0 :(得分:1)
解析与引用和转义有关的制表符/逗号分隔数据文件有一些细微之处。
为了节省大量工作,挫折和头痛,您应该考虑使用现有的CSV解析库之一,例如OpenCSV或Apache Commons CSV。
作为答案而不是评论发布,因为OP没有说明重新发明轮子的理由,并且有些任务确实已经解决了#34;一劳永逸。