我有2个数据框,将它们连接后,当我打印连接框的架构时,我得到的是列名称,而不是嵌套名称
@SpringBootTest
public class MySteps {
private final MyContext myContext = new MyContext();
}
加入框架之后,架构仅具有列名,而不具有各自的别名
df1 = sc.parallelize([
("1984-01-01", 1, 638.55),
("1984-01-02", 2, 638.55)
]).toDF(["date1", "hour", "value1"])
df2 = sc.parallelize([
("1984-01-01", 1, 638.55),
("1984-02-01", 2, 638.55)
]).toDF(["date2", "hour", "value2"])
# df1
# +----------+----+------+
# | date1|hour|value1|
# +----------+----+------+
# |1984-01-01| 1|638.55|
# |1984-01-02| 2|638.55|
# +----------+----+------+
# df2
# +----------+----+------+
# | date2|hour|value2|
# +----------+----+------+
# |1984-01-01| 1|638.55|
# |1984-02-01| 2|638.55|
# +----------+----+------+
相反,我希望它像下面那样打印
joined_frame = df1.alias('df1').join(df2.alias('df2'), ['hour'])
joined_frame.printSchema()
root
|-- hour: long (nullable = true)
|-- date1: string (nullable = true)
|-- value1: double (nullable = true)
|-- date2: string (nullable = true)
|-- value2: double (nullable = true)
此外,当我尝试打印列名时,它只是给出了子列名
root
|-- df1
|-- date1: string (nullable = true)
|-- hour: long (nullable = true)
|-- value1: double (nullable = true)
|-- df2
|-- date2: string (nullable = true)
|-- hour: long (nullable = true)
|-- value2: double (nullable = true)
当我尝试访问某些列时,出现以下错误
joined_frame.columns
['hour', 'date1', 'value1', 'date2', 'value2']
基本上,如何获得具有以下别名的“ joined_frame”列?
org.apache.spark.sql.AnalysisException: cannot resolve '`hour1`' given input columns: [df1.date1, df1.value1, df2.date2, df2.value2, hour]