我的数据框格式低于
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+
|DataPartition |TimeStamp |FFAction|!||IdentifierValue_effectiveFrom|IdentifierValue_effectiveTo|IdentifierValue_identifierEntityId|IdentifierValue_identifierEntityTypeId|IdentifierValue_identifierTypeId|
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+
|SelfSourcedPublic|2018-03-05T11:54:18+00:00|I|!| |1900-01-01T00:00:00+00:00 |9999-12-31T00:00:00+00:00 |4295903126 |404010 |320150 |
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+
我想在下面的列
添加条件的额外列IdentifierValue_identifierEntityTypeId
使用以下条件添加额外的列分区
如果IdentifierValue_identifierEntityTypeId = 1001371402 then partition = Repno2FundamentalSeries else if IdentifierValue_identifierEntityTypeId404010 then partition = Repno2Organization
这就是我想要实现的目标
val temp = temp1.withColumn("Partition", when($"IdentifierValue_identifierEntityTypeId" === "404010", 0).otherwise("Repno2FundamentalSeries"))
temp.show(false)
我的输出低于输出值,但值为零
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+---------+
|DataPartition |TimeStamp |FFAction|!||IdentifierValue_effectiveFrom|IdentifierValue_effectiveTo|IdentifierValue_identifierEntityId|IdentifierValue_identifierEntityTypeId|IdentifierValue_identifierTypeId|Partition|
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+---------+
|SelfSourcedPublic|2018-03-05T11:54:18+00:00|I|!| |1900-01-01T00:00:00+00:00 |9999-12-31T00:00:00+00:00 |4295903126 |404010 |320150 |0 |
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+---------+
我是scala的新手,因此提出了基本问题
对于列上的多个条件如何写入和否则。 这不适合我。像
这样的错误线程“main”中的异常java.lang.IllegalArgumentException: 否则()只能在先前生成的列上应用一次 by when()
val dataMain = dataMain1.withColumn(
"Partition",
when($"RelationObjectId_relatedObjectType" === "EDInstrument" && $"RelationObjectId_relatedObjectType" === "Fundamental", "Instrument2Fundamental")
.otherwise(when($"RelationObjectId_relatedObjectType" === "EDInstrument" && $"RelationObjectId_relatedObjectType" === "FundamentalSeries", "Instrument2FundamentalSeries"))
.otherwise(when($"RelationObjectId_relatedObjectType" === "Organization" && $"RelationObjectId_relatedObjectType" === "Fundamental", "Organization2Fundamental"))
.otherwise(when($"RelationObjectId_relatedObjectType" === "Organization" && $"RelationObjectId_relatedObjectType" === "FundamentalSeries", "Organization2FundamentalSeries"))
)
答案 0 :(得分:2)
根据您提供的条件,您应该更改when条件,如下所示。
如果IdentifierValue_identifierEntityTypeId = 1001371402则分区 = Repno2FundamentalSeries else if IdentifierValue_identifierEntityTypeId404010 then partition = Repno2Organization
"arr.0" : bson.M{"$exists": true}
输出:
df1.withColumn("Partition",
when($"IdentifierValue_identifierEntityTypeId" === "1001371402", "Repno2FundamentalSeries")
.otherwise("Repno2Organization")
)
修改强>
以下是编写嵌套+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+-----------------------+
|DataPartition |TimeStamp |FFAction|!||IdentifierValue_effectiveFrom|IdentifierValue_effectiveTo|IdentifierValue_identifierEntityId|IdentifierValue_identifierEntityTypeId|IdentifierValue_identifierTypeId|Partition |
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+-----------------------+
|SelfSourcedPublic|2018-03-05T11:54:18+00:00|I||! |1900-01-01T00:00:00+00:00 |9999-12-31T00:00:00+00:00 |4295903126 |404010 |320150 |Repno2FundamentalSeries|
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+-----------------------+
When
)
希望这有帮助
答案 1 :(得分:0)
实现这一目标的另一种方法是:您可以使用CASE WHEN语句之类的SQL而不是使用WithColumn
如果您熟悉sql
,这可能更容易编码例如
val dataMain = dataMain1.selectExpr("*",
"""CASE WHEN RelationObjectId_relatedObjectType = 'EDInstrument'
THEN 'Instrument2Fundamental'
WHEN cond2
THEN value2
ELSE defaultValue end AS partition""")