如何使用pyspark对spark数据框中的新添加列执行功能

时间:2020-09-23 17:49:46

标签: apache-spark pyspark apache-spark-sql

我试图使用文字在pyspark中创建一个新列,但是当我尝试使用该列执行某些功能时,它显示了这样的错误 <VirtualHost *:80> ServerName api.hooks Redirect 308 / https://prod.api.hooks/ </VirtualHost> <VirtualHost *:443> ServerName prod.api.hooks SSLEngine on SSLCertificateKeyFile "${SRVROOT}/conf/key/private.key" SSLCertificateFile "${SRVROOT}/conf/key/certificate.crt" Header set Access-Control-Allow-Origin "*" Header always set Content-Security-Policy "default-src 'none'!" Header always set Referrer-Policy "strict-origin-when-cross-origin" Header always set Strict-Transport-Security "max-age=31536000; includeSubdomains;" Header always set X-Content-Type-Options "nosniff" DocumentRoot C:/wamp64/www/production/hooks/public <Directory "C:/wamp64/www/production/hooks/public"> AllowOverride None Require all granted FallbackResource /index.php </Directory> <Directory "C:/wamp64/www/production/hooks/public/bundles"> FallbackResource disabled </Directory> </VirtualHost> 我的代码是

AttributeError: 'NoneType' object has no attribute 'show'

有人可以帮我解决这个问题吗?

2 个答案:

答案 0 :(得分:1)

您的show()触发操作并返回非对象。

autodata1=autodata.withColumn('pricePerMPG',(col('PRICE')/(col('MPG-CITY')+col('MPG-HWY')/2)))
autodata1.show(truncate=False)
from pyspark.sql.functions import max
max = autodata1.agg({"pricePerMPG": "max"}).collect()[0]
print(max)

答案 1 :(得分:1)

autodata1=autodata.withColumn('pricePerMPG',(col('PRICE')/(col('MPG-CITY')+col('MPG-HWY')/2))).show(truncate=False)`

这里autodata是一个数据框,但是当您在此末尾添加show时,它将返回单位,这就是为什么autodata1不是数据框的原因。