Question

嗨我有一个包含字符串和numpy float 64值的元组列表。我想将其更改为spark数据帧。但我收到了错误。列表和错误如下所示。

这是我的代码：

schema = StructType([StructField("key", StringType(), True), StructField("value", DoubleType(), True)])

coef_df = spark.createDataFrame(coef_list, schema)

Answer 1

正如@ user6910411所示，Spark SQL不支持NumPy类型（还）

这是一个稍微简单的解决方案（也包含评论）

import numpy as np

data = [
    (np.unicode('100912strategy_id'), np.float64(-2.1412)),
    (np.unicode('10exchange_ud'), np.float64(-1.2412))]

df = (sc.parallelize(data)
    .map(lambda x: (str(x[0]), float(x[1])))
    .toDF(["key","value"]))
df.show()

+-----------------+-------+
|              key|  value|
+-----------------+-------+
|100912strategy_id|-2.1412|
|    10exchange_ud|-1.2412|
+-----------------+-------+

PySpark：无法从列表中创建数据框

1 个答案: