我想通过两个特殊的cloumns过滤数据帧,我需要验证数据帧range_id_Test中的take数据应该包含“range_id”以及数据帧familyid_Test中的“family_id”。
val range_id_Test = newArticlesGold.select("range_id").except(article_ranges.select("id").distinct())
val familyid_Test = newArticlesGold.select("family_id").except(article_family.select("id").distinct())
val addedData = newArticlesGold.filter($"range_id" === range_id_Test("range_id") || $"family_id" === familyid_Test("family_id"))
以下是数据样本
Range_Test
|range_id|
+--------+
| -1|
+--------+
Family_test
|family_id|
+---------+
| -1|
+---------+
和newArticlesGold
+-----------+-------------+--------------------------------------------------+--------+---+------+------+--------+---+--------+---------+
|CODEARTICLE|STRUCTURE |DES |TYPEMARK|TYP|IMPLOC|MARQUE|GAMME |TAR|range_id|family_id|
+-----------+-------------+--------------------------------------------------+--------+---+------+------+--------+---+--------+---------+
|662180137 |1173201099902|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|2 |9 |Local | | | |1173 |1173201 |
|662180717 |1173201099902|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|2 |9 |Local | | | |1173 |1173201 |
|435160050 |1443609010306|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|7 |7 |Local | |60900010| |1443 |1443609 |
|435160060 |1443609010306|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|7 |7 |Local | |60900010| |1443 |1443609 |
|553260040 |1428659020203|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|7 |7 |Local | | | |-1 |-1 |
+-----------+-------------+--------------------------------------------------+--------+---+------+------+--------+---+--------+---------+
我想摆脱最后一行 任何帮助将不胜感激。
答案 0 :(得分:0)
您正在尝试进行某种“嵌套循环”,但Spark中不允许这样做。相反,您需要使用join
。像这样:
import spark.implicits._
val range_id_Test = newArticlesGold
.select('range_id as "r_id")
.except(article_ranges.select("id").distinct())
val familyid_Test = newArticlesGold
.select('family_id as "f_id")
.except(article_family.select("id").distinct())
val excludedData = newArticlesGold
.join(range_id_Test, 'range_id === 'r_id)
.join(familyid_Test, 'family_id === 'f_id)
.drop("r_id", "f_id")
val result = newArticlesGold.except(excludedRows)
答案 1 :(得分:0)
根据我在您的问题中所理解的情况,innerjoin
range_id_Test
和familyid_Test
newArticlesGold
range_id_Test
可以得到您想要的输出结果如下(你可以看到)我已更改为familyid_Test
和val range_id_Test = newArticlesGold.select("range_id".as("range_id_1")).except(article_ranges.select("id").distinct())
val familyid_Test = newArticlesGold.select("family_id".as("family_id_1")).except(article_family.select("id").distinct())
val addedData = newArticlesGold.join(range_id_Test, $"range_id" =!= range_id_Test("range_id_1"))
.join(familyid_Test, $"family_id" =!= familyid_Test("family_id_1"))
.select(newArticlesGold.columns.map(col): _*)
表的列名。
<DeviceCapability Name="bluetooth.rfcomm">
<Device Id="any">
<Function Type ="name:serialPort"/>
</Device>
</DeviceCapability>
我希望这就是你要找的东西。