尝试从python 3.5中的pyspark.sql.functions导入col时未解析的引用

时间:2017-07-28 07:53:36

标签: python apache-spark pyspark pyspark-sql spark-structured-streaming

请参阅此处的帖子: Spark structured streaming with python 我想导入' col'在python 3.5中

from pyspark.sql.functions import col

但是我收到一条错误,说明未解决对col的引用。我已经安装了pyspark图书馆,所以只是想知道有没有#col?col'已从pyspark图书馆中删除?我怎样才能导入' col'然后

4 个答案:

答案 0 :(得分:4)

事实证明这是IntelliJ IDEA的问题。即使它显示未解析的引用,我的程序仍然在命令行中运行没有任何问题。

答案 1 :(得分:4)

尝试安装“ pyspark-stubs”,在Pycharm中我遇到了同样的问题,通过解决,我解决了该问题。

答案 2 :(得分:2)

类似col的函数不是python代码中定义的显式函数,而是动态生成的。

它还将通过pylint之类的静态分析工具报告错误。

所以最简单的使用方式应该是这样

from pyspark.sql import functions as F

F.col("colname")

python/pyspark/sql/functions.py

中的以下代码
_functions = {
    'lit': _lit_doc,
    'col': 'Returns a :class:`Column` based on the given column name.',
    'column': 'Returns a :class:`Column` based on the given column name.',
    'asc': 'Returns a sort expression based on the ascending order of the given column name.',
    'desc': 'Returns a sort expression based on the descending order of the given column name.',

    'upper': 'Converts a string expression to upper case.',
    'lower': 'Converts a string expression to upper case.',
    'sqrt': 'Computes the square root of the specified float value.',
    'abs': 'Computes the absolute value.',

    'max': 'Aggregate function: returns the maximum value of the expression in a group.',
    'min': 'Aggregate function: returns the minimum value of the expression in a group.',
    'count': 'Aggregate function: returns the number of items in a group.',
    'sum': 'Aggregate function: returns the sum of all values in the expression.',
    'avg': 'Aggregate function: returns the average of the values in a group.',
    'mean': 'Aggregate function: returns the average of the values in a group.',
    'sumDistinct': 'Aggregate function: returns the sum of distinct values in the expression.',
}

def _create_function(name, doc=""):
    """ Create a function for aggregator by name"""
    def _(col):
        sc = SparkContext._active_spark_context
        jc = getattr(sc._jvm.functions, name)(col._jc if isinstance(col, Column) else col)
        return Column(jc)
    _.__name__ = name
    _.__doc__ = doc
    return _

for _name, _doc in _functions.items():
    globals()[_name] = since(1.3)(_create_function(_name, _doc))

答案 3 :(得分:-1)

PyCharm编辑器似乎有问题,我也能够通过Python控制台使用trim()运行程序。