Question

我有一个包含以下数据的文件：

Samples of data observed from students of bio tech class

abc xyz are the representatives

description line1

descriptive line 2

..

descriptive line3

they have 79 students in the class

Class student list :

1 abc 23 m

2 def 22 m

3 xys 23 m

我想从＆＃34; abc＆＃34;开始提取行。（第2行）直到＆＃34;他们有79名学生在课堂上＃34;到另一个RDD

输出应如下所示：

abc xyz are the representatives

description line1

descriptive line 2

..

descriptive line3

they have 79 students in the class

这是我一直试图使用的代码

val st_pattern1= """(abc).*?(?=\bthey have 79 students in the> class\b)""".r 
val test = sc.textFile("filname.txt") 
.map(lines => st_pattern1.findFirstIn(lines)) 
.foreach(println)

打印如下

none

none

none

..

..

none

我无法弄清楚我做错了什么。基本上是新的scala / spark。你能帮我解决一下如何将这些线路单独提取到RDD。

根据正则表达式从rdd中提取行

0 个答案: