Question

我正在关注edX上的Spark简介课程。但是，我无法理解一些事情，以下是实验室任务。仅供参考，我不是在寻找解决方案。

我无法理解为什么我收到错误

TypeError：＆＃39;列＆＃39;对象不可调用

以下是代码

from pyspark.sql.functions import regexp_replace, trim, col, lower
def removePunctuation(column):
    """

    Args:
        column (Column): A Column containing a sentence.

    """

    # This following is giving error. I believe I am calling all the rows from the dataframe 'column' where the attribute is named as 'sentence'
    result = column.select('sentence') 

    return result

sentenceDF = sqlContext.createDataFrame([('Hi, you!',),
                                         (' No under_score!',),
                                         (' *      Remove punctuation then spaces  * ',)], ['sentence'])
sentenceDF.show(truncate=False)
(sentenceDF
 .select(removePunctuation(col('sentence')))
 .show(truncate=False))

你能说一点吗？ TIA。

Answer 1

column参数不是DataFrame对象，因此无法访问select方法。您需要使用其他功能来解决此问题。

提示：查看import语句。

如何在Apache Spark中使用select（）转换？

1 个答案: