Question

我有一个垃圾邮件味精的数据集，它具有以下数据类型：

pyspark.rdd.PipelinedRDD

当我做spams.take(3)时，我得到：

[["Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's"], ['WINNER!! As a valued network customer you have been selected to receivea £900 prize reward! To claim call 09061701461. Claim code KL341. Valid 12 hours only.'], ['Had your mobile 11 months or more? U R entitled to Update to the latest colour mobiles with camera for Free! Call The Mobile Update Co FREE on 08002986030']]

如您所见，它的括号内将列表中的每个元素分开。如何摆脱那些括号？我尝试了多种方法来展平它，但是似乎没有任何效果。

Answer 1

您可以使用rdd的flatMap方法。它使您可以从一行中生成多行。

spams.flatMap(lambda x:x).take(3)

Answer 2

由于您不清楚问题是要删除列表中的之后还是之前，并且其他用户已经回答了之后，我将在数据仍为rdd时回答。很简单，

spams = spams.map(lambda x:x[0])
print spams.take(3)

这将删除内部的“括号”。

Answer 3

这些代码行会有所帮助。

    >>> msg = [["Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 0
8452810075over18's"],
...  ['WINNER!! As a valued network customer you have been selected to receivea £900 prize reward! To claim call 09061701461. Claim code KL341. Valid
12 hours only.'],
...  ['Had your mobile 11 months or more? U R entitled to Update to the latest colour mobiles with camera for Free! Call The Mobile Update Co FREE on
08002986030']]
>>> msg = [x[0] for x in msg]
>>> msg
["Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075o
ver18's", 'WINNER!! As a valued network customer you have been selected to receivea £900 prize reward! To claim call 09061701461. Claim code KL341. Va
lid 12 hours only.', 'Had your mobile 11 months or more? U R entitled to Update to the latest colour mobiles with camera for Free! Call The Mobile Upd
ate Co FREE on 08002986030']

Answer 4

尝试一个for循环，“数据”是您从spam.take（3）返回的列表。

mylist = []
for entry in data:
  print(entry)
  for e in entry:
    mylist.append(e)
print(mylist)

如何在python中展平RDD？

4 个答案: