Question

I'm currently working on Data Science Experience and would like to import a CSV file as a SparkSession DataFrame. I am able to successfully import the DataFrame, however, all of the column attributes are converted to string type. How do you make this DSX feature recognize the types present in the CSV file?

Answer 1

目前，实际创建pyspark.sql.DataFrame的生成代码如下所示：

df_data_1 = spark.read\
  .format('org.apache.spark.sql.execution.datasources.csv.CSVFileFormat')\
  .option('header', 'true')\
  .load('swift://container_name.' + name + '/test.csv')
df_data_1.take(5)

您必须添加以下选项，然后才会推断出架构：

.option(inferschema='true')\

Importing a SparkSession DataFrame on DSX

1 个答案: