I have two dataframes in Spark Scala, but one of these is composed by a unique column. I have to join them but they have no column in common. The number of row is the same.
val userFriends=userJson.select($"friends",$"user_id")
val x = userFriends("friends")
.rdd
.map(x => x.getList(0).toArray.map(_.toString))
val y = x.map(z=>z.count(z=>true)).toDF("friendCount")
I have to join userFriends with y
答案 0 :(得分:1)
在没有公共字段的情况下加入它们是不可能的,除非您可以依赖排序,在这种情况下,您可以在两个数据帧上使用行号(带窗函数)并加入行 - 号。
但是在您的情况下,这似乎没有必要,只需在您的数据框中保留from subprocess import Popen, PIPE
p = Popen("E:/cygwin/bin/bash.exe", stdin=PIPE, stdout=PIPE)
p.stdin.write("ls")
p.stdin.close()
out = p.stdout.read()
print (out)
列,这样的内容应该有效:
user_id