您好我有以下代码尝试从Seq [T]
创建DataFramecase class CaseConvert[T: TypeTag](a: T)
def createDf[T: TypeTag](data: Seq[T]): DataFrame = {
spark.createDataFrame(data.map(CaseConvert[T])
}
当通过传递类型执行上面的createDf方法时说seq [java.sql.Timestamp]它失败显示以下错误
UnsupportedOperaionException:不支持类型TypeTag [java.sql.Timestamp]的架构。我想我必须为CaseConvert类创建编码器,但不知道如何使用Scala的复杂泛型来完成它。请指导我是Spark和Scala的新手。
答案 0 :(得分:0)
像这样添加repl
:
>>> df1
col1 col2
0 a b
1 b c
2 d e
>>> df2
col1 col2
0 a b
1 b c
>>> df1.merge(df2, on='col1', how='left', indicator=True)
col1 col2_x col2_y _merge
0 a b b both
1 b c c both
2 d e NaN left_only
>>> df3 = df1.merge(df2, on='col1', how='left', indicator=True)
>>> df4 = pd.DataFrame([['d', 'e']], columns=['col1', 'col2'])
>>> df3.merge(df4, on='col1', how='left', indicator=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/devtools/uat/anaconda4321/lib/python3.6/site-packages/pandas/core/frame.py", line 4722, in merge
copy=copy, indicator=indicator)
File "/usr/local/devtools/uat/anaconda4321/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 54, in merge
return op.get_result()
File "/usr/local/devtools/uat/anaconda4321/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 567, in get_result
self.left, self.right)
File "/usr/local/devtools/uat/anaconda4321/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 605, in _indicator_pre_merge
"Cannot use name of an existing column for indicator column")
ValueError: Cannot use name of an existing column for indicator column
>>> df3.merge(df4, on='col1', how='left', indicator='exists')
col1 col2_x col2_y _merge col2 exists
0 a b b both NaN left_only
1 b c c both NaN left_only
2 d e NaN left_only e both
就够了。 Encoder
的隐式解决方案将完成剩下的工作。