Question

我需要在Pyspark中执行线性回归，我只是尝试遵循此链接的步骤：

https://towardsdatascience.com/building-a-linear-regression-with-pyspark-and-mllib-d065c3ba246a

就我而言，我只是使用以下代码将数据导入了数据块：

## Importation de données

# File location and type
file_location = "/FileStore/tables/Spark.csv"
file_type = "csv"

# CSV options
infer_schema = "false"
first_row_is_header = "True"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

然后，我需要创建一个汇编向量，该向量具有解释变量的值，而无需用变量作为目标，ID和_c0：

ignore = ['ID', 'target', '_c0']
from pyspark.ml.feature import VectorAssembler
assembler = VectorAssembler(
    inputCols=[x for x in df.columns if x not in ignore],
    outputCol='features')

我的问题是当我使用此命令时：

new_df = vectorAssembler.transform(df)

出现此错误：

IllegalArgumentException: u'Data type StringType of column is not supported

我是Spark的初学者，曾被搜索过很多次，但我真的无法理解这个问题，因为通常vectorAssembler.transform只是将具有选定变量的新单向量添加到初始数据帧中？没有？？请帮忙！

IllegalArgumentException：u'不支持列的数据类型StringType

0 个答案: