我有UDP_file.txt
包含:
2014-03-02 07:59:37;source-address=123.235.78.125 source-port=1780
2014-03-02 07:59:37;source-address=123.235.132.181 source-port=56399
2014-03-02 07:59:37;source-address=123.234.141.253 source-port=49170
2014-03-02 07:59:37;source-address=123.234.104.225 source-port=39123
2014-03-02 07:59:37;source-address=123.234.104.225 fake-port=0000
我需要做的是:
val file_in = sc.textFile("UPD_file.txt")
val FullName = """(^.{19}).+source-address=([^"]+) source-port=([^"]+)""".r
当我在一行上测试模式时,它可以工作:
scala> val FullName(ip,sa,sp) = "2014-03-02 07:59:37;source-address=10.114.104.225 source-port=3912
ip: String = 2014-03-02 07:59:37
sa: String = 10.114.104.225
sp: String = 39123
或
scala> "2014-03-02 07:59:37;source-address=10.115.78.125 source-port=1780" match { case FullName(ip,sa,sp) }
(2014-03-02 07:59:37,10.115.78.125,1780)
但我不知道如何在加载文件的每一行上使用它。
file_in.AndWhatNow?
你能帮忙吗?如有任何建议,我将不胜感激
的Pawel
答案 0 :(得分:4)
您可以将输入拆分为单独的行并映射到其上
val FullName = """(.+);source-address=(.+) (?:fake|source)-port=(.+)""".r
val names = file_in map { line =>
val FullName(ip, sa, sp) = line
(ip, sa, sp)
}
<强>更新强>
按端口类型拆分结果会将其捕获到组中,然后应用partition
方法
val FullName = """(.+);source-address=(.+) (fake|source)-port=(.+)""".r
val (goodOnes, fakes) = file_in map { line =>
val FullName(ip, sa, pt, sp) = line
(ip, sa, pt, sp)
} partition { _._3 == "source" }
答案 1 :(得分:0)
使用previouse解决方案,当行与模式不匹配时,我们会收到错误。
如果我们想要为匹配模式的行返回不同的值,而对于那些不匹配或不匹配的事件,则使用此代码:
val names = file_in map { line => line match {
case FullName(ip,sa,sp) => (ip,sa,sp)
case Second_FullName(val1, val2) => (val1, val2)
case _ => Nil
}
}