我有这样的list
:
["Dhoni 35 WC 785623", "Sachin 40 Batsman 4500", "Dravid 45 Batsman 50000", "Kumble 41 Bowler 456431", "Srinath 41 Bowler 65465"]
应用过滤器后,我想要这样:
["Dhoni WC", "Sachin Batsman", "Dravid Batsman", "Kumble Bowler", "Srinath Bowler"]
我尝试过这种方式
m = sc.parallelize([[“ Dhoni 35 WC 785623”,“ Sachin 40 Batsman 4500”,“ Dravid 45 Batsman 50000”,“ Kumble 41 Bowler 456431”,“ Srinath 41 Bowler 65465”])
n = m.map(lambda k:k.split(''))
o = n.map(lambda s:(s [0])) o.collect()
['Dhoni','Sachin','Dravid','Kumble','Srinath']
q = n.map(lambda s:s [2])
q.collect()
['WC','Batsman','Batsman','Bowler','Bowler']
答案 0 :(得分:1)
提供的所有列表项的格式都相同,一种实现方法是使用map
。
rdd = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])
rdd.map(lambda x:(x.split(' ')[0]+' '+x.split(' ')[2])).collect()
输出:
['Dhoni WC', 'Sachin Batsman', 'Dravid Batsman', 'Kumble Bowler', 'Srinath Bowler']