Question

您好我有以下代码尝试从Seq [T]

创建DataFrame

case class CaseConvert[T: TypeTag](a: T)

def createDf[T: TypeTag](data: Seq[T]): DataFrame = {
   spark.createDataFrame(data.map(CaseConvert[T])
}

当通过传递类型执行上面的createDf方法时说seq [java.sql.Timestamp]它失败显示以下错误

UnsupportedOperaionException：不支持类型TypeTag [java.sql.Timestamp]的架构。我想我必须为CaseConvert类创建编码器，但不知道如何使用Scala的复杂泛型来完成它。请指导我是Spark和Scala的新手。

Answer 1

像这样添加repl：

>>> df1
  col1 col2
0    a    b
1    b    c
2    d    e
>>> df2
  col1 col2
0    a    b
1    b    c
>>> df1.merge(df2, on='col1', how='left', indicator=True)
  col1 col2_x col2_y     _merge
0    a      b      b       both
1    b      c      c       both
2    d      e    NaN  left_only
>>> df3 = df1.merge(df2, on='col1', how='left', indicator=True)
>>> df4 = pd.DataFrame([['d', 'e']], columns=['col1', 'col2'])
>>> df3.merge(df4, on='col1', how='left', indicator=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/devtools/uat/anaconda4321/lib/python3.6/site-packages/pandas/core/frame.py", line 4722, in merge
copy=copy, indicator=indicator)
  File "/usr/local/devtools/uat/anaconda4321/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 54, in merge
return op.get_result()
  File "/usr/local/devtools/uat/anaconda4321/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 567, in get_result
self.left, self.right)
  File "/usr/local/devtools/uat/anaconda4321/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 605, in _indicator_pre_merge
"Cannot use name of an existing column for indicator column")
ValueError: Cannot use name of an existing column for indicator column
>>> df3.merge(df4, on='col1', how='left', indicator='exists')
  col1 col2_x col2_y     _merge col2     exists
0    a      b      b       both  NaN  left_only
1    b      c      c       both  NaN  left_only
2    d      e    NaN  left_only    e       both

就够了。 Encoder的隐式解决方案将完成剩下的工作。

创建Spark数据集时不支持类型TypeTag [java.sql.Timestamp]的模式

1 个答案: