PySpark非空RDD通用地图返回[]

时间:2018-08-14 21:56:08

标签: apache-spark pyspark

我偶尔遇到PySpark的问题。当我为特定的RDD收集()时,它返回所有值,但是当我尝试映射时,它返回[]。 最奇怪的部分是,可能五分之一会在相同会话中正确返回相同RDD的值,而不会进行任何更改。听起来是不可能的,而且可能是……

关键点:

>>> pairs.collect()
[('b', ('d', 3)), ('c', ('d', 2)), ('g', ('d', -2)), ('b', ('z', 1)), 
('a', ('b', 4)), ('a', ('c', 3)), ('b', ('c', 2)), ('b', ('f', -1)), 
('a', ('g', -2)), ('c', ('z', -4))]
>>> pairs.map(lambda x: x).collect()
[]
>>> pairs.flatMap(lambda x: x).collect()
[]

问题似乎出现在while循环的第四行,“ new_uvs = pair.flatMap(lambda x:(x [0],x [1] [0]))。collect()”作为此行什么也不返回。这是我的完整代码:

sc = spark.sparkContext

l = [("d",  ("e",1)),("b",  ("d",3)),("c",  ("d",2)),("g",  ("d",-2)), 
("b",  ("z",1)),("a",  ("b",4)),("a",  ("c",3)),("b",  ("c",2)),("b",  
("f",-1)),("a",  
("g",-2)),
("a",  ("f",3)),("c",  ("z",-4)), ("x",  ("y",0))]

network = sc.parallelize(l)  

#source and destinations
src = sc.broadcast('a')
dest = sc.broadcast('e') 

#collect all u's and v's in the node path thus far into rdd
pairs = network.filter(lambda x: x[0]==dest.value or x[1] . 
[0]==dest.value)
#store pairs in a list so you can add new pairs
path_pool_static = pairs.collect()
path_pool = sc.broadcast(path_pool_static)
#collect all pairs with src in it
uvs = sc.broadcast(pairs.flatMap(lambda x: (x[0], x[1][0])).collect())
network = network.filter(lambda x: x not in path_pool.value) 

while network.filter(lambda x: x[0]==src.value or x[1] . 
[0]==src.value).collect()!=[]:
    pairs = network.filter(lambda x: x[0] in uvs.value or x[1][0] in 
    uvs.value)
    #intialize uvs_static as uvs list, add new pairs u's and v's to 
    list and broadcast it
    uvs_static = uvs.value
    new_uvs = pairs.flatMap(lambda x: (x[0], x[1][0])).collect()
    uvs_static.extend(new_uvs)
    uvs = sc.broadcast(uvs_static)
    #update path_static with new pairs and broadcast it
    path_pool_static.extend(pairs.collect())
    path_pool = sc.broadcast(path_pool_static)
    #remove pairs already in path
    network = network.filter(lambda x: x not in path_pool.value)

0 个答案:

没有答案