spark如何设置列数的数据集

时间:2017-12-28 20:17:15

标签: java apache-spark dataset

我有一个这样的文件:

test057 - 192.168.1.12 - 00:11:22:33:44:57 - 2ZZ66-1 node 6 -  - test052 - 192.168.1.16 - 00:11:22:33:44:61 - 2ZZ66-1 Node2 -
test058 - 192.168.1.13 - 00:11:22:33:44:58 - 2ZZ66-1 node 5 -  - test053 - 192.168.1.17 - 00:11:22:33:44:62 - 2ZZ66-1 Node1 -
test_a001 - 192.168.100.10 - 1234.5678.0123 - AZZDEF -  -  -  -  -  -
test_b001 - 192.168.100.11 - 4321.1234.1234 - GHIJKL -  -  -  -  -  -

如何将其拆分为4列?

| name    |     ip        |     mac           |    tag         |
|--------------------------------------------------------------|    
|test057  | 192.168.1.12  | 00:11:22:33:44:57 | 2ZZ66-1 node 6 |
|test052  | 192.168.1.16  | 00:11:22:33:44:61 | 2ZZ66-1 Node2  |
|test058  | 192.168.1.13  | 00:11:22:33:44:58 | 2ZZ66-1 node 5 |    
|test053  | 192.168.1.17  | 00:11:22:33:44:62 | 2ZZ66-1 Node1  |
|test_a001| 192.168.100.10| 1234.5678.0123    | AZZDEF         |
|test_b001| 192.168.100.11| 4321.1234.1234    | GHIJKL         |

1 个答案:

答案 0 :(得分:0)

您可以加载文件,然后按' - '并将列映射到一个类。

FindFrag.java

打印时,结果输出如下所示。

val ipFile = sc.textFile("file:////in_f/test/inpf.txt");
val ipSplit = ipFile.map(_.split("-"))
case class IP (name: String, ip: String, mac: String, tag: String)
val ipDF = ipSplit.map(x => IP(x(0),x(1), x(2),x(3))).toDF()
ipDF.select($"name", $"ip", $"mac", $"tag").take(4).foreach(println)

enter image description here