以下代码尝试计算次数" Apple"出现在HTML文件中。
object Question extends App {
def validWords(fileSentancesPart: List[String], wordList: List[String]): List[Option[String]] =
fileSentancesPart.map(sentancePart => {
if (isWordContained(wordList, sentancePart)) {
Some(sentancePart)
} else {
None
}
})
def isWordContained(wordList: List[String], sentancePart: String): Boolean = {
for (word <- wordList) {
if (sentancePart.contains(word)) {
return true;
}
}
false
}
lazy val lines = scala.io.Source.fromFile("c:\\data\\myfile.txt" , "latin1").getLines.toList.map(m => m.toUpperCase.split(" ")).flatten
val vw = validWords(lines, List("APPLE")) .flatten.size
println("size is "+vw)
}
根据Scala代码,计数为79。但是当我用文本编辑器打开文件时,它会找到81个单词&#34; Apple&#34;遏制。搜索不区分大小写。可以找出bug的位置吗? (我假设错误是我的代码,而不是文本编辑器!)
我已经编写了几个测试,但代码似乎在这些简单的用例中表现得如预期:
import scala.collection.mutable.Stack;
import org.scalatest.FlatSpec;
import org.scalatest._;
class ConvertTes extends FlatSpec {
"Valid words" should "be returned" in {
val fileWords = List("this" , "is" , "apple" , "applehere")
val validWords = List("apple")
lazy val lines = scala.io.Source.fromFile("c:\\data\\myfile.txt" , "latin1").getLines.toList.map(m => m.toUpperCase.split(" ")).flatten
val l : List[String] = validWords(fileWords, validWords).flatten
l.foreach(println)
}
"Entire line " should "be returned for matched word" in {
val fileWords = List("this" , "is" , "this apple is an" , "applehere")
val validWords = List("apple")
val l : List[String] = validWords(fileWords, validWords).flatten
l.foreach(println)
}
}
上面的代码中正在解析的HTML文件(称为&#34; c:\ data \ myfile.txt&#34;):
https://drive.google.com/file/d/0B1TIppVWd0LSVG9Edl9OYzh4Q1U/view?usp=sharing
有关上述代码替代品的任何建议欢迎。
认为我的问题是根据@Jack Leow评论。代码:
val fileWords = List("this", "is", "this appleisapple an", "applehere")
val validWords = List("apple")
val l: List[String] = validWords(fileWords, validWords).flatten
println("size : " + l.size)
打印尺寸为2,应为3
答案 0 :(得分:0)
我认为您应该执行以下操作:
def validWords(
fileSentancesPart: List[String],
wordList: List[String]): List[Option[String]] =
fileSentancesPart /* add flatMap */ .flatMap(_.tails)
.map(sentancePart => {
if (isWordContained(wordList, sentancePart)) {
Some(sentancePart)
} else {
None
}
})
def isWordContained(
wordList: List[String],
sentancePart: String): Boolean = {
for (word <- wordList) {
//if (sentancePart.contains(word)) {
if (sentancePart.startsWith(word)) { // use startsWith
return true;
}
}
false
}
答案 1 :(得分:0)
您可以使用带有Source
迭代器的正则表达式:
val regex = "([Aa]pple)".r
val count = Source.fromFile("/test.txt").getLines.map(regex.findAllIn(_).length).sum