我有一个具有以下架构的dataFrame:
|-- nlucontexttrail: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- agentid: string (nullable = true)
| | |-- intent: struct (nullable = true)
| | | |-- confidence: double (nullable = true)
| | | |-- entities: array (nullable = true)
| | | | |-- element: struct (containsNull = true)
| | | | | |-- id: string (nullable = true)
| | | | | |-- values: array (nullable = true)
| | | | | | |-- element: struct (containsNull = true)
| | | | | | | |-- literal: string (nullable = true)
| | | | | | | |-- value: string (nullable = true)
| | | |-- intentname: string (nullable = true)
| | | |-- name: string (nullable = true)
| | |-- intentcandidates: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- confidence: double (nullable = true)
| | | | |-- entities: array (nullable = true)
| | | | | |-- element: struct (containsNull = true)
| | | | | | |-- id: string (nullable = true)
| | | | | | |-- values: array (nullable = true)
| | | | | | | |-- element: struct (containsNull = true)
| | | | | | | | |-- literal: string (nullable = true)
| | | | | | | | |-- value: string (nullable = true)
| | | | |-- intentname: string (nullable = true)
| | | | |-- name: string (nullable = true)
| | |-- modelid: string (nullable = true)
| | |-- modelversion: long (nullable = true)
| | |-- nlusessionid: string (nullable = true)
| | |-- usednluengine: string (nullable = true)
| | |-- usednluengine: string (nullable = true)
如果所有人都可以看到突出显示的重复列(“ usednluengine” ),则其中之一的值为“无”,而另一列的值为预期值。现在,我要删除具有“无”值的列。我也在下面共享数据,请仔细检查。
[{"agentid":"dispatcher","intent":{"confidence":0.8822699,"entities":[{"id":"duration","values":[{"literal":"2 Sekunden","value":"PT2S"}]},{"id":"date","values":[{"literal":"eins","value":"T23:00:00Z"},{"literal":"eins","value":"T23:00:00Z"}]},{"id":"number","values":[{"literal":"eins","value":"1"},{"literal":"2","value":"2"},{"literal":"eins","value":"1"}]},{"id":"station","values":[{"literal":"eins","value":"eins"},{"literal":"eins","value":"eins"}]},{"id":"number_values","values":[{"literal":"eins","value":"1"},{"literal":"eins","value":"1"}]},{"id":"percentage_values","values":[{"literal":"höchsten","value":"100"}]}],"intentname":null,"name":"TV"},"intentcandidates":[{"confidence":0.8822699,"entities":[{"id":"duration","values":[{"literal":"2 Sekunden","value":"PT2S"}]},{"id":"date","values":[{"literal":"eins","value":"T23:00:00Z"},{"literal":"eins","value":"T23:00:00Z"}]},{"id":"number","values":[{"literal":"eins","value":"1"},{"literal":"2","value":"2"},{"literal":"eins","value":"1"}]},{"id":"station","values":[{"literal":"eins","value":"eins"},{"literal":"eins","value":"eins"}]},{"id":"number_values","values":[{"literal":"eins","value":"1"},{"literal":"eins","value":"1"}]},{"id":"percentage_values","values":[{"literal":"höchsten","value":"100"}]}],"intentname":null,"name":"TV"}],"modelid":"SVH_STAGING__DISPATCHER","modelversion":13,"nlusessionid":null,"usednluengine":"luis"},{"agentid":"dispatcher","intent":{"confidence":0.140685484,"entities":[{"id":"duration","values":[{"literal":"2 Sekunden","value":"PT2S"}]},{"id":"date","values":[{"literal":"eins","value":"T23:00:00Z"},{"literal":"eins","value":"T23:00:00Z"}]},{"id":"number","values":[{"literal":"eins","value":"1"},{"literal":"2","value":"2"},{"literal":"eins","value":"1"}]},{"id":"number_values","values":[{"literal":"eins","value":"1"},{"literal":"eins","value":"1"}]},{"id":"percentage_values","values":[{"literal":"höchsten","value":"100"}]}],"intentname":null,"name":"TV__SWITCH_CHANNEL"},"intentcandidates":[{"confidence":0.140685484,"entities":[{"id":"duration","values":[{"literal":"2 Sekunden","value":"PT2S"}]},{"id":"date","values":[{"literal":"eins","value":"T23:00:00Z"},{"literal":"eins","value":"T23:00:00Z"}]},{"id":"number","values":[{"literal":"eins","value":"1"},{"literal":"2","value":"2"},{"literal":"eins","value":"1"}]},{"id":"number_values","values":[{"literal":"eins","value":"1"},{"literal":"eins","value":"1"}]},{"id":"percentage_values","values":[{"literal":"höchsten","value":"100"}]}],"intentname":null,"name":"TV__SWITCH_CHANNEL"}],"modelid":"SVH_STAGING__TV","modelversion":13,"nlusessionid":null,"usednluengine":"luis"}]
您可以将以下数据放在下面的链接中,以正确的格式查看: http://jsonviewer.stack.hu/
要注意的一点是重复列,其值为“ None”将在数据中不可见,但实际上它在df.printSchema中可用,我想删除所有重复列/嵌套列(它们是内部的一部分struct),并保留具有值的列。我的意思是数据没有变化,但实际上是架构发生了变化。
希望我能解决我的问题。如果没有,请在下面发表评论以进行进一步的讨论。