这是frequency_creative数据帧
[('tags', 'int'),
('user_id', 'bigint'),
('processdate', 'date'),
('brandsurvey_name', 'string'),
('survey_id', 'string'),
('questionid', 'string'),
('questiontext', 'string'),
('frequency', 'bigint'),
('creative_id', 'string')]
当我使用fillna(0)
时,某些user_id列的值会被损坏。但是,如果未使用fillna(0)
,则表明工作正常。
frequency_creative.select("user_id").distinct().show()
+-------------------+
| user_id|
+-------------------+
|1665009053012894694|
| 840031193618494976|
+-------------------+
frequency_creative =frequency_creative.select("processdate","tags","survey_id","brandsurvey_name","user_id","questionid","questiontext","frequency","creative_id").fillna(0)
frequency_creative.select("user_id").distinct().show()
after select
+-------------------+
| user_id|
+-------------------+
|1665009053012894720|
| 840031193618494976|
+-------------------+
**********************************************
without fillna(0)
before select
+-------------------+
| user_id|
+-------------------+
|1665009053012894694|
| 840031193618494976|
+-------------------+
frequency_creative =frequency_creative.select("processdate","tags","survey_id","brandsurvey_name","user_id","questionid","questiontext","frequency","creative_id")
frequency_creative.select("user_id").distinct().show()
after select
+-------------------+
| user_id|
+-------------------+
|1665009053012894694|
| 840031193618494976|
+-------------------+