Join two Dataframe without a common field in Spark-scala

时间:2017-06-09 12:57:45

标签: scala apache-spark join

I have two dataframes in Spark Scala, but one of these is composed by a unique column. I have to join them but they have no column in common. The number of row is the same.

val userFriends=userJson.select($"friends",$"user_id") 
val x = userFriends("friends")
        .rdd
        .map(x => x.getList(0).toArray.map(_.toString)) 
val y = x.map(z=>z.count(z=>true)).toDF("friendCount") 

I have to join userFriends with y

1 个答案:

答案 0 :(得分:1)

在没有公共字段的情况下加入它们是不可能的,除非您可以依赖排序,在这种情况下,您可以在两个数据帧上使用行号(带窗函数)并加入行 - 号。

但是在您的情况下,这似乎没有必要,只需在您的数据框中保留from subprocess import Popen, PIPE p = Popen("E:/cygwin/bin/bash.exe", stdin=PIPE, stdout=PIPE) p.stdin.write("ls") p.stdin.close() out = p.stdout.read() print (out) 列,这样的内容应该有效:

user_id