val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val regex : Regex = """<a href=\\"([^\\\r\n]*)\\">Rubik<\/a>""".r
val files = "book1"
val file = sc.textFile(files)
val rubikLines = file.filter(line => line.contains(">Rubik</a>"))
val matchedMap = scala.collection.mutable.Map[String,Int]()
for (line <- rubikLines)
{
for (patternMatch <- regex.findAllMatchIn(line))
{
val ent = patternMatch.group(1)
if (matchedMap.contains(ent))
{
matchedMap(ent) = (matchedMap(ent)+1)
println(s"found ${matchedMap(ent)}")
}
else
{
matchedMap(ent) = 1
println(s"not found ${matchedMap(ent)}")
}
}
println(s"map size ${matchedMap.size}") //print1
}
println(s"map size ${matchedMap.size}") //print2
打印这些语句的结果为[未找到,找到,找到,未找到],并且对于两个元素[Rubik Cube(已存在3次),ErnőRubik(已存在一次)]是正确的 print1的输出为2,而print2的输出为0。当存在for循环时,为什么matchMap为空?