pyspark 2.2'DataFrame'对象没有属性'map',后向兼容性缺少如何解决它

时间:2017-12-08 09:44:13

标签: apache-spark pyspark spark-dataframe

当我使用Spark 1.6时,代码工作正常:

ddl = sqlContext.sql("""show create table {mytable }""".format(mytable="""mytest.my_dummytable"""))
map(''.join, ddl\
.map(lambda my_row: [str(data).replace("`", "'") for data in my_row])\
.collect())

然而,当我转向Spark 2.2时,我遇到以下异常:

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)
<ipython-> in <module>()
      1 ddl = sqlContext.sql("""show create table {mytable }""".format(mytable ="""mytest.my_dummytable"""))
----> 2 map(''.join, ddl..map(lambda my_row: [str(data).replace("`", "'") for data in my_row]).collect())

spark2/python/pyspark/sql/dataframe.py in __getattr__(self, name)
            if name not in self.columns:
                raise AttributeError(
->                  "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
            jc = self._jdf.apply(name)
            return Column(jc)

AttributeError: 'DataFrame' object has no attribute 'map'

1 个答案:

答案 0 :(得分:1)

您必须先致电.rdd。 Spark 2.0停止了对df.map()的别名df.rdd.map()。请参阅this