Question

在火花中，总会有这样的操作：

 hiveContext.sql("select * from demoTable").show()

当我在Spark Official API中查找 show（） 方法时，结果如下： enter image description here 当我将关键字更改为“数据集”时，我发现DataFrame上使用的方法属于数据集。怎么会发生？有什么暗示吗？

Answer 1

根据the documentation：

数据集是分布式数据集合。

和

DataFrame是一个组织成命名列的数据集。

所以，技术上： myCellTemplate = "<DataTemplate xmlns=""http://schemas.microsoft.com/winfx/2006/xaml/presentation"" xmlns:x=""http://schemas.microsoft.com/winfx/2006/xaml""> " & "<Ellipse Width=""25"" Height=""25"" Margin=""0,3"" Stroke=""Black"" StrokeThickness=""1"" " & "Fill=""{Binding [" & myColumn.ColumnName & "], Converter={StaticResource myConverter}}""></Ellipse></DataTemplate>"相当于DataFrame

最后一句话：

在Scala API中，DataFrame只是Dataset [Row]的类型别名。而在Java API中，用户需要使用数据集来表示DataFrame。

简而言之，具体类型为Dataset<Row>。

DataFrame和Dataset

1 个答案: