如何使用正确的DataType获取DataFrame中的值?

时间:2017-05-23 14:59:33

标签: scala apache-spark

当我尝试在./gradlew clean中获取某些值时,例如:

DataFrame

结果类型为df.select("date").head().get(0) // type: Any ,这不是预期的。 由于Any包含dataframe数据,因此每个schema应该知道DataType,因此当我尝试使用column获取值时,它应该返回具有正确类型的值。但事实并非如此。

相反,我需要使用get(0)来指定我想要的DataType,这看起来很奇怪,不方便,让我发疯。

当我在创建getDate(0)时为每个schema指定了DataTypes column时,我不想使用不同的Dataframe } column`s。

是否有一些方便的方法可以使用自己正确的类型获取值?也就是说,如何使用getXXX()' for different中指定的正确DataType来获取值? 谢谢!

2 个答案:

答案 0 :(得分:2)

您可以将通用 $defaultData = Array ( [info] => Array ( [0] => stdClass Object ( [__type] => customerDEDto:#Profile [ArchieBurns] => Array ( [0] => stdClass Object ( [Id] => 2987279348 [Description] => fgdfdfg [FromDate] => 01.05.2016 [Area] => Murwillubah Trail [ToDate] => 01.08.2016 ) [1] => stdClass Object ( [Id] => 7593862202 [Menu] => CHICKEN FREESTYLE WRAP [FromDate] => 01.05.2016 [Area] => South Jaybarry Gardens [ToDate] => 01.08.2016 ) ) [TheoRobertson] => Array ( [0] => stdClass Object ( [Id] => 6491059338 [Description] => khjkh [FromDate] => 01.05.2016 [Area] => Murwillubah Trail [ToDate] => 01.08.2016 ) [1] => stdClass Object ( [Id] => 6448295430 [Menu] => HEY PESTO! PIZZA [FromDate] => 01.05.2016 [Area] => South Clock Barn [ToDate] => 01.08.2016 ) ) [ChristopherParry] => Array ( [0] => stdClass Object ( [Id] => 9808392996 [Description] => asdad [FromDate] => 01.05.2016 [Area] => Murwillubah Trail [ToDate] => 01.08.2016 ) [1] => stdClass Object ( [Id] => 4701327370 [Menu] => PIZZA FRESCO [FromDate] => 01.05.2016 [Area] => South Jaybarry Gardens [ToDate] => 01.08.2016 ) ) [EdwardRose] => Array ( [0] => stdClass Object ( [Id] => 9261501733 [Description] => fghfg [FromDate] => 01.05.2016 [Area] => Murwillubah Trail [ToDate] => 01.08.2016 ) [1] => stdClass Object ( [Id] => 9064113070 [Menu] => CHICKEN FREESTYLE WRAP [FromDate] => 01.05.2016 [Area] => South Jaybarry Gardens [ToDate] => 01.08.2016 ) ) [CoreyWest] => Array ( [0] => stdClass Object ( [Id] => 7434455815 [Description] => ewrwr [FromDate] => 01.05.2016 [Area] => Murwillubah Trail [ToDate] => 01.08.2016 ) [1] => stdClass Object ( [Id] => 8740648411 [Menu] => Pancakes [FromDate] => 01.05.2016 [Area] => South Bacton Grove [ToDate] => 01.08.2016 ) ) [Language] => de [ProfileId] => jkl-3541682444-x01 [ThiagoBird] => Array ( [0] => stdClass Object ( [Id] => 6810891812 [Description] => jkljkl [FromDate] => 01.05.2016 [Area] => Murwillubah Trail [ToDate] => 01.08.2016 ) [1] => stdClass Object ( [Id] => 02643029354 [Menu] => Spaghetti [FromDate] => 01.05.2016 [Area] => Moslee Grade Northeast [ToDate] => 01.08.2016 ) ) ) [1] => stdClass Object ( [__type] => customerENDto:#Profile [ArchieBurns] => Array ( [0] => stdClass Object ( [Id] => 1682400506 [Description] => werner [FromDate] => 01.05.2016 [Area] => Murwillubah Trail [ToDate] => 01.08.2016 ) [1] => stdClass Object ( [Id] => 7814641591 [Menu] => Waffles [FromDate] => 01.05.2016 [Area] => Launch Quay [ToDate] => 01.08.2016 ) ) [TheoRobertson] => Array ( [0] => stdClass Object ( [Id] => 7351265992 [Description] => high [FromDate] => 01.05.2016 [Area] => Murwillubah Trail [ToDate] => 01.08.2016 ) [1] => stdClass Object ( [Id] => 2467265927 [Menu] => goose [FromDate] => 01.05.2016 [Area] => South Savine Quadrant [ToDate] => 01.08.2016 ) ) [ChristopherParry] => Array ( [0] => stdClass Object ( [Id] => 9066762572 [Description] => hfghfg [FromDate] => 01.05.2016 [Area] => Murwillubah Trail [ToDate] => 01.08.2016 ) [1] => stdClass Object ( [Id] => 5338820142 [Menu] => CHICKEN FREESTYLE WRAP [FromDate] => 01.05.2016 [Area] => Hughenden Alley [ToDate] => 01.08.2016 ) ) [EdwardRose] => Array ( [0] => stdClass Object ( [Id] => 6594908359 [Description] => trzrtz [FromDate] => 01.05.2016 [Area] => Murwillubah Trail [ToDate] => 01.08.2016 ) [1] => stdClass Object ( [Id] => 0075490683 [Menu] => French dip [FromDate] => 01.05.2016 [Area] => Heustis Plaza [ToDate] => 01.08.2016 ) ) [CoreyWest] => Array ( [0] => stdClass Object ( [Id] => 0142643741 [Description] => dfsdf [FromDate] => 01.05.2016 [Area] => Murwillubah Trail [ToDate] => 01.08.2016 ) [1] => stdClass Object ( [Id] => 6165554440 [Menu] => CHICKEN FREESTYLE WRAP [FromDate] => 01.05.2016 [Area] => East Otterburn Heights [ToDate] => 01.08.2016 ) ) [Language] => en [ProfileId] => mzd-d95840ec4559 [ThiagoBird] => Array ( [0] => stdClass Object ( [Id] => 1905649736 [Description] => rsfsr [FromDate] => 01.05.2016 [Area] => Murwillubah Trail [ToDate] => 01.08.2016 ) [1] => stdClass Object ( [Id] => 4580348850 [Menu] => eggrolls [FromDate] => 01.05.2016 [Area] => Shorthills Crescent [ToDate] => 01.08.2016 ) ) ) ) ) 方法称为getAsgetAs[Int](columnIndex)或使用getAs[String](columnIndex)getInt(columnIndex)等特定方法。

链接到Scaladoc for org.apache.spark.sql.Row

答案 1 :(得分:1)

Scala是一种静态类型语言。因此,在Row上定义的get方法只能返回单个类型的值,因为get方法的返回类型是Any。它不能为一个呼叫返回Int而对另一个呼叫返回String

您应该为每种类型调用getIntgetDate和其他get方法。或者getAs method,您可以在其中将类型作为参数传递(例如row.getAs[Int](0))。

如评论中所述,其他选项是

  • 使用数据集代替DataFrame。
  • 使用Spark SQL