我想使用过滤器方法从数据框中过滤出一些记录。我有一个Struct地址数组,正在与列值进行比较。我正在使用以下代码:
<div>
我想基于比较从地址结构中删除该元素。示例架构如下:
entityJoinB_df.filter(col("addressstructm.streetName").cast(StringType) =!= (col("streetName")))
但是它不起作用。可能是什么问题。有人可以帮忙吗?
样本输入:
root
|-- apartmentnumber: string (nullable = true)
|-- streetName: string (nullable = true)
|-- streetName2: string (nullable = true)
|-- fullName: string (nullable = false)
|-- address: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: string (nullable = true)
| | |-- streetName: string (nullable = true)
| | |-- streetName2: string (nullable = true)
| | |-- buildingName: string (nullable = true)
| | |-- type: string (nullable = true)
| | |-- city: string (nullable = true)
|-- isActive: boolean (nullable = false)
样本输出:
[
{
"apartmentnumber": 122,
"streetName": "ABC ABC",
"streetName2": "CBD",
"fullName": "MR. X"
"address": [{
"streetName": "ABC ABC",
"streetName2": "CBD",
"buildingName": "ONE",
"city":"NY"
},
{
"streetName": "XYZ ABC",
"streetName2": "XCB",
"buildingName": "ONE",
"city":"NY"
}]
}
]
谢谢, Upen
答案 0 :(得分:0)
我认为可以通过将过滤器表达式修改为
来解决您的问题import org.apache.spark.sql.functions._
entityJoinB_df.withColumn("address",
expr("filter(addressstructm.address, x-> ( x.streetName != streetName AND x.streetName != 'Secondary' ) )"))
假设addressstructm
是您数据框的别名
下面是与您的示例结构相似的示例结构
import org.apache.spark.sql.functions._
object StructParsin {
def main(args: Array[String]): Unit = {
val spark = Constant.getSparkSess
import spark.implicits._
val df = List(
Apartment(Array(Element("ABC ABC","123"),Element("XYZ ABC","123")),"ABC ABC"),
Apartment(Array(Element("DEF","123"),Element("DEF1","123")),"XYZ")
)
.toDF
df.printSchema()
df.withColumn("newAddress",
expr("filter(address, x -> ( x.streetName != streetName AND x.streetName != 'Secondary' ))"))
.show()
}
}
case class Element (streetName: String)
case class Apartment(address: Array[Element],streetName:String)
答案 1 :(得分:0)
尝试下面的代码。
scala>
entityJoinB_df
.withColumn("address",
array_except($"address",
array($"address"(array_position($"address.streetName",$"streetName")-1))
)
)
.show(false)
+-------------------------+---------------+--------+----------+-----------+
|address |apartmentnumber|fullName|streetName|streetName2|
+-------------------------+---------------+--------+----------+-----------+
|[[ONE, NY, XYZ ABC, XCB]]|122 |MR. X |ABC ABC |CBD |
+-------------------------+---------------+--------+----------+-----------+