Question

我是pyspark的新手，正在开展我的第一个火花项目，我面临两个问题。

a）无法使用

引用列

//span[contains(text(),'Edit Student')]
//*[contains(text(),'Edit Student')]
//span [@class='button-grid-action kendo-lexia-tooltip icon-pencil']/span
//span [@title='Edit Student']/span
//span [contains(@title,'Edit Student')]/span
//span [contains(@class,'button-grid-action kendo-lexia-tooltip icon-pencil')]/span

b）无法使用像

这样的聚合值替换我的spark数据帧中的值

df["col1"].show() 

***TypeError: 'Column' object is not callable***

非常感谢任何帮助！

更新

我尝试了以下代码段，但它又返回了另一个错误。

Code:
from pyspark import SparkConf, SparkContext
from pyspark.sql.functions import *
from pyspark.sql import Row, HiveContext, SQLContext, Column
from pyspark.sql.types import *

df = hive_context.table("db_new.temp_table")
df.select("col1").fillna(df.select("col1").mean())

***AttributeError: 'DataFrame' object has no attribute 'mean'***

Answer 1

这应该有效：

df[["col1"]].show()

替换pyspark数据框中的值

1 个答案: