DataFrame第一个函数ignoreNulls不起作用

时间:2017-10-28 08:23:32

标签: scala apache-spark apache-spark-sql

读取Spark文档的第一个函数,它提到ignoreNulls将获得第一个非null值。

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.{Window, WindowSpec}

object tmp {
  def main(args: Array[String]): Unit = {
    val spark =  SparkSession.builder().master("local") getOrCreate()
    import spark.implicits._

    val input = Seq(
      (1234,  1, None),
      (1234,  2, Some(1)),

      (5678,  1, Some(11)),
      (5678,  2, Some(22))
    ).toDF("service_id", "counter", "value")

    lazy val window: WindowSpec = Window.partitionBy("service_id").orderBy("counter")
    val firsts = input.withColumn("first_value", first("value", ignoreNulls = true).over(window))
    firsts.orderBy("service_id", "counter").show()
  }
}

返回以下输出。我想在first_value的第一行中为null值为1.我在这里遗漏了什么

+----------+-------+-----+-----------+
|service_id|counter|value|first_value|
+----------+-------+-----+-----------+
|      1234|      1| null|       null|
|      1234|      2|    1|          1|
|      5678|      1|   11|         11|
|      5678|      2|   22|         11|
+----------+-------+-----+-----------+

1 个答案:

答案 0 :(得分:1)

您必须定义class SaleOrderLine(models.Model): _inherit = "sale.order.line" @api.depends('product_uom_qty', 'discount', 'price_unit', 'tax_id', 'product_id','price_subtotal') def _compute_amount(self): active_id = self.env.context.get('active_ids', []) or [] new = self.env['sale.pack.wizard'].browse(active_id) #Here i want to show 'test' field id from wizard class SalePackWizard(models.TransientModel): _name = "sale.pack.wizard" _description = "Sale Pack Wizard" product_id = fields.Many2one('product.product', string="Product Pack", required=True, domain="[('is_pack','=',True)]") test = fields.One2many('product.gold','service',string="Pack Products",change_default=True, default=_onchange_action_product_add ) 选项才能使其正常工作

rangeBetween

因为如果你没有在lazy val window: WindowSpec = Window.partitionBy("service_id").orderBy("counter").rangeBetween(Long.MinValue, Long.MaxValue) 函数中定义范围,则采用增量范围,即对于第一行,范围是1行,对于第二行,范围是2行,依此类推。 ....所有在分区窗口内。

我希望答案很有帮助