我正在整理一个小脚本,根据我在本地保存的网页填充一些数据(http://payday.wikia.com/wiki/Achievements_(Payday_2))。
剧本:
public static void main(String [] args) throws FileNotFoundException{
File file = new File("C:\\Users\\Jester\\Desktop\\data scrap payday\\Achievements_(Payday_2).htm");
int count = 0;
int words = 0;
Scanner scanner = new Scanner(file);
while (scanner.hasNext()) {
String nextToken = scanner.next();
if (nextToken.contains("unlock")||nextToken.contains("Unlock")){
count++;
}
words++;
System.out.println(nextToken);
}
scanner.close();
System.out.println(count);
System.out.println(words);
}
然而,while循环以
行结束<td style="vertical-align: top; width: 64px"> <a href="http://vignette3.wikia.nocookie.net/payday/images/d/db/From_Russia_With_Love.jpg/revision/latest?cb=20131103145029" class="image image-thumbnail" ><img src="data:image/gif;base64,R0lGODlhAQABAIABAAAAAP///yH5BAEAAAEALAAAAAABAAEAQAICTAEAOw%3D%3D" alt="From Russia With Love" class="lzy lzyPlcHld " data-image-key="From_Russia_With_Love.jpg" data-image-name="From Russia With Love.jpg" data-src="http://vignette3.wikia.nocookie.net/payday/images/d/db/From_Russia_With_Love.jpg/revision/latest?cb=20131103145029" width="64" height="64" onload="if(typeof ImgLzy==='object'){ImgLzy.load(this)}" ><noscript><img src="http://vignette3.wikia.nocookie.net/payday/images/d/db/From_Russia_With_Love.jpg/revision/latest?cb=20131103145029" alt="From Russia With Love" class="" data-image-key="From_Russia_With_Love.jpg" data-image-name="From Russia With Love.jpg" width="64" height="64" ></noscript></a>
最后一句是:
href="http://vignette3.wikia.nocookie.net/p
(不知道为什么它会切断一半的字,其中没有空格)。
如果删除该行,整个html中似乎有各种符合while循环结束条件的行,但我似乎无法弄清楚模式可能是什么。
关于为什么scanner.hasNext()在这些上返回false的任何想法?