过滤数字的数据

时间:2017-08-29 14:36:04

标签: scala apache-spark

我有一个带有列的数据框CODEARTICLE这里是数据框

|CODEARTICLE|    STRUCTURE|                 DES|TYPEMARK|TYP|IMPLOC|MARQUE|GAMME|TAR|
+-----------+-------------+--------------------+--------+---+------+------+-----+---+
| GENCFFRIST|9999999999998|xxxxxxxxxxxxxxxxx...|       0|  0| Local|      |     |   |
| GENCFFMARC|9999999999998|xxxxxxxxxxxxxxxxx...|       0|  0| Local|      |     |   |
| GENCFFESCO|9999999999998|xxxxxxxxxxxxxxxxx...|       0|  0| Local|      |     |   |
|  GENCFFTNA|9999999999998|xxxxxxxxxxxxxxxxx...|       0|  0| Local|      |     |   |
| GENCFFEMBA|9999999999998|xxxxxxxxxxxxxxxxx...|       0|  0| Local|      |     |   |
|  789600010|9999999999998|xxxxxxxxxxxxxxxxx...|       7|  1| Local|      |     |   |
|  799700040|9999999999998|xxxxxxxxxxxxxxxxx...|       0|  1| Local|      |     |   |
|  799701000|9999999999998|xxxxxxxxxxxxxxxxx...|       0|  1| Local|      |     |   |
|  899980490|9999999999998|xxxxxxxxxxxxxxxxx...|       0|  9| Local|      |     |   |
|  429600010|9999999999998|xxxxxxxxxxxxxxxxx...|       0|  1| Local|      |     |   |
|  559970040|9999999999998|xxxxxxxxxxxxxxxxx...|       0|  0| Local|      |     |   |
|  679500010|9999999999998|xxxxxxxxxxxxxxxxx...|       0|  1| Local|      |     |   |
|  679500040|9999999999998|xxxxxxxxxxxxxxxxx...|       0|  1| Local|      |     |   |
|  679500060|9999999999998|xxxxxxxxxxxxxxxxx...|       0|  1| Local|      |     |   |
+-----------+-------------+--------------------+--------+---+------+------+-----+---+

我想只拍摄有数字CODEARTICLER的行   //连接到表TMP_STRUCTURE oracle

  val spark = sparkSession.sqlContext
  val articles_Gold = spark.load("jdbc",
    Map("url" -> "jdbc:oracle:thin:System/maher@//localhost:1521/XE",
      "dbtable" -> "IPTECH.TMP_ARTICLE")).select("CODEARTICLE", "STRUCTURE", "DES", "TYPEMARK", "TYP", "IMPLOC", "MARQUE", "GAMME", "TAR")

val filteredData =articles_Gold.withColumn("test",'CODEARTICLE.cast(IntegerType)).filter($"test"!==null)

非常感谢你

2 个答案:

答案 0 :(得分:0)

使用na.drop

articles_Gold.withColumn("test",'CODEARTICLE.cast(IntegerType)).na.drop("test")

答案 1 :(得分:0)

您可以在title功能的列中使用.isNotNull功能。您甚至不需要为您的逻辑创建另一列。您可以简单地执行以下操作

filter

我希望答案很有帮助