使用Scala: 我有一个如下的emp表
id, name, dept, address
1, a, 10, hyd
2, b, 10, blr
3, a, 5, chn
4, d, 2, hyd
5, a, 3, blr
6, b, 2, hyd
代码:
val inputFile = sc.textFile("hdfs:/user/edu/emp.txt");
val inputRdd = inputFile.map(iLine => (iLine.split(",")(0),
iLine.split(",")(1),
iLine.split(",")(3)
));
// filtering only few columns Now i want to pull hyd addressed employees complete data
问题::我不想打印所有emp详细信息,我只想打印所有来自hyd的emp详细信息。
答案 0 :(得分:0)
我认为以下解决方案将有助于解决您的问题。
val fileName = "/path/stact_test.txt"
val strRdd = sc.textFile(fileName).map { line =>
val data = line.split(",")
(data(0), data(1), data(3))
}.filter(rec=>rec._3.toLowerCase.trim.equals("hyd"))
分割数据后,使用元组RDD中的第3个项目过滤位置。
输出:
(1, a, hyd)
(4, d, hyd)
(6, b, hyd)
答案 1 :(得分:-1)
您可以尝试使用数据框
val viewsDF=spark.read.text("hdfs:/user/edu/emp.txt")
val splitedViewsDF = viewsDF.withColumn("id", split($"value",",").getItem(0))
.withColumn("name", split($"value", ",").getItem(1))
.withColumn("address", split($"value", ",").getItem(3))
.drop($"value")
.filter(df("address").equals("hyd") )