什么应该是sqlContext.createDataFrame()的参数?

时间:2016-10-27 04:24:52

标签: python pyspark spark-dataframe

此代码从给定列表创建数据框:

sample_one = [(0, 'mouse'), (1, 'black')]
sample_two = [(0, 'cat'), (1, 'tabby'), (2, 'mouse')]
sample_three =  [(0, 'bear'), (1, 'black'), (2, 'salmon')]
sample_data_df = sqlContext.createDataFrame([(sample_one,), (sample_two,),(sample_three,)], ['features'])

在createDataFrame()中,为什么在sample_one(sample_one,)之后会给出额外的逗号?

1 个答案:

答案 0 :(得分:1)

此语法用于创建元组。您可以尝试以下方法:

>>> sample_one = [(0, 'mouse'), (1, 'black')]
>>> type((sample_one))
<type 'list'>
>>> type((sample_one,))
<type 'tuple'>