Question

我在文本文件中有一个如下字符串：

@Override
    protected void configure(AuthenticationManagerBuilder auth) throws Exception {
        auth.inMemoryAuthentication().passwordEncoder(NoOpPasswordEncoder.getInstance()).withUser("admin").password("{noop}admin").roles("admin");
    }

我已将其读入RDD，并尝试将其转换为MapType（StringType（），StringType（））。当我在下面尝试时，它会给出nulltype错误。

ar.txt has 'K1:v1,K2:v2, K3:v3'

请提出如何转换为MapType（）列的建议？

Answer 1

我能够解决以下问题。

将其读入rdd并拆分成对：

[Showing in steps though we can combine]

##File Input format : 'k1:v1,k2:v2,k3:v3'
rdd1 = sc.textFile(file_path)
rdd2 = rdd1.(lambda x : x.encode("ascii","ignore").split(","))
rdd3 = rdd2.(lambda x : (x[0].split(":"),x[1].split(":"),x[2].split(":")))
df = rdd3.toDF()
df.withColumn("map_column",create_map(col('_1')[0],col('_1')[1],col('_2')[0],col('_2')[1],col('_3')[0],col('_3')[1]))

如果有更好的替代方法或使其对任意数量的配对动态化，请提出建议。

pyspark将字符串数组转换为Map（）

1 个答案: