Question

I have two dataframes in Spark Scala, but one of these is composed by a unique column. I have to join them but they have no column in common. The number of row is the same.

val userFriends=userJson.select($"friends",$"user_id") 
val x = userFriends("friends")
        .rdd
        .map(x => x.getList(0).toArray.map(_.toString)) 
val y = x.map(z=>z.count(z=>true)).toDF("friendCount")

I have to join userFriends with y

Answer 1

在没有公共字段的情况下加入它们是不可能的，除非您可以依赖排序，在这种情况下，您可以在两个数据帧上使用行号（带窗函数）并加入行 - 号。

但是在您的情况下，这似乎没有必要，只需在您的数据框中保留from subprocess import Popen, PIPE p = Popen("E:/cygwin/bin/bash.exe", stdin=PIPE, stdout=PIPE) p.stdin.write("ls") p.stdin.close() out = p.stdout.read() print (out)列，这样的内容应该有效：

user_id

Join two Dataframe without a common field in Spark-scala

1 个答案: