Question

我有输入数据框（ ip_df ），此数据框中的数据如下所示：

id            col_value
1               10
2               11
3               12

id和col_value的数据类型为字符串

我需要获取另一个数据帧（ output_df ），其数据类型为id作为字符串，col_value列为十进制**（15,4）**。这是没有数据转换，只是数据类型转换。我可以使用PySpark使用它吗？任何帮助将不胜感激

Answer 1

尝试使用cast方法：

pagination

Answer 2

尝试以下声明。

output_df = ip_df.withColumn("col_value",ip_df["col_value"].cast('float'))

Answer 3

您可以更改多个列类型

使用 withColumn() -

from pyspark.sql.types import DecimalType, StringType

output_df = ip_df \
  .withColumn("col_value", ip_df["col_value"].cast(DecimalType())) \
  .withColumn("id", ip_df["id"].cast(StringType()))

使用select()

from pyspark.sql.types import DecimalType, StringType

output_df = ip_df.select(
  (ip_df.id.cast(StringType())).alias('id'),
  (ip_df.col_value.cast(DecimalType())).alias('col_value')
)

使用spark.sql()

ip_df.createOrReplaceTempView("ip_df_view")

output_df = spark.sql('''
SELECT 
    STRING(id),
    DECIMAL(col_value)
FROM ip_df_view;
''')

更改PySpark数据帧中列的数据类型

3 个答案: