我有一个包含以下数据的文件:
Samples of data observed from students of bio tech class
abc xyz are the representatives
description line1
descriptive line 2
..
descriptive line3
they have 79 students in the class
Class student list :
1 abc 23 m
2 def 22 m
3 xys 23 m
我想从" abc"开始提取行。 (第2行)直到"他们有79名学生在课堂上#34;到另一个RDD
输出应如下所示:
abc xyz are the representatives
description line1
descriptive line 2
..
descriptive line3
they have 79 students in the class
这是我一直试图使用的代码
val st_pattern1= """(abc).*?(?=\bthey have 79 students in the> class\b)""".r
val test = sc.textFile("filname.txt")
.map(lines => st_pattern1.findFirstIn(lines))
.foreach(println)
打印如下
none
none
none
..
..
none
我无法弄清楚我做错了什么。基本上是新的scala / spark。 你能帮我解决一下如何将这些线路单独提取到RDD。