我有一个示例数据框
df_that_I_have
+---------+---------+-------+
| country | members | some |
+---------+---------+-------+
| India | 50 | 1 |
+---------+---------+-------+
| Japan | 20 | 3 |
+---------+---------+-------+
| India | 20 | 1 |
+---------+---------+-------+
| Japan | 10 | 3 |
+---------+---------+-------+
我想要一个看起来像这样的数据框
df_that_I_want
+---------+---------+-------+
| country | members | some |
+---------+---------+-------+
| India | 70 | 10 | // 5 * Sum of "some" for India, i.e. (1 + 1)
+---------+---------+-------+
| Japan | 30 | 30 | // 5 * Sum of "some" for Japan, i.e. (3 + 3)
+---------+---------+-------+
第二个数据框的总和为members
,而some
的总和乘以5。
这就是我为实现这一目标所做的工作
val df_that_I_want = df_that_I_have
.select(df_that_I_have("country"),
df_that_I_have.groupBy("country").sum("members"),
5 * df_that_I_have.groupBy("country").sum("some")) //Problem here
但编译器不允许我这样做,因为显然我不能将5与列相乘。
如何将Integer值乘以每个国家/地区some
的总和?
答案 0 :(得分:3)
您可以尝试lit功能。
scala> val df_that_I_have = Seq(("India",50,1),("India",20,1),("Japan",20,3),("Japan",10,3)).toDF("Country","Members","Some")
df_that_I_have: org.apache.spark.sql.DataFrame = [Country: string, Members: int, Some: int]
scala> val df1 = df_that_I_have.groupBy("country").agg(sum("members"), sum("some") * lit(5))
df1: org.apache.spark.sql.DataFrame = [country: string, sum(members): bigint, ((sum(some),mode=Complete,isDistinct=false) * 5): bigint]
scala> val df_that_I_want= df1.select($"Country",$"sum(Members)".alias("Members"), $"((sum(Some),mode=Complete,isDistinct=false) * 5)".alias("Some"))
df_that_I_want: org.apache.spark.sql.DataFrame = [Country: string, Members: bigint, Some: bigint]
scala> df_that_I_want.show
+-------+-------+----+
|Country|Members|Some|
+-------+-------+----+
| India| 70| 10|
| Japan| 30| 30|
+-------+-------+----+
答案 1 :(得分:1)
请试试这个
df_that_I_have.select("country").groupBy("country").agg(sum("members"), sum("some") * lit(5))
答案 2 :(得分:0)
df_that_I_have.select("country").groupBy("country").agg(sum("members"), sum("some") * lit(5))
lit函数用于创建此处为5的文字值列。
当您无法直接乘以5时,它会创建一个包含5的列并与其相乘。