我有一个数据框(df
),我将其划分为4个新的dfs(media
,client
,code_type
和date
)。 media
有一列空值,而其他三列只有1-dim dfs,每个都由空值组成。在替换每个数据帧中的空值后,我尝试pd.concat
得到一个df并得到下面的结果。
code_type
0 P
1 P
2 P
3 P
4 P
5 P
code_name media_type acq. revenue
0 RASH NaN 50.0 34004.0
1 100 NaN 10.0 1035.0
2 NEWS NaN 61.0 3475.0
3 DR NaN 53.0 4307.0
4 SPORTS NaN 45.0 6503.0
5 DOUBL NaN 13.0 4205.0
client_id
0 2.0
1 2.0
2 2.0
3 2.0
4 2.0
5 2.0
date
0 2016-08-15
1 2016-08-15
2 2016-08-15
3 2016-08-15
4 2016-08-15
5 2016-08-15
我pd.merge
media
使用另一个单独的df替换media.media_type
下的NaN,后者添加了新的media_type_y
code_name media_type_x acq. revenue media_type_y
0 RASH NaN 282 34004.0 Radio
1 100 NaN 119 1035.0 NaN
2 NEWS NaN 81 3475.0 SiriusXM
3 DR NaN 33 4307.0 SiriusXM
4 SPORTS NaN 25 6503.0 SiriusXM
5 DOUBL NaN 23 4205.0 Podcast
然后我放弃media_type_x
并将media_type_y
重命名为media_type
final = m.loc[:,('code_name','media_type_y', 'acquisition', 'revenue')]
final = final.rename(columns={'media_type_y': 'media_type'})
因此,当我连接时,我有一个完整的df。
clean = pd.concat([media, client, code_type, date], axis=1)
code media acq. revenue client code_type date
0 RASH Radio 50.0 34004.0 NaN NaN NaT
1 100 NaN 10.0 1035.0 NaN NaN NaT
2 NEWS SiriusXM 61.0 3475.0 NaN NaN NaT
3 DR SiriusXM 53.0 4307.0 NaN NaN NaT
4 SPORTS SiriusXM 45.0 6503.0 NaN NaN NaT
5 DOUBL Podcast 13.0 4205.0 NaN NaN NaT
clean.client
应该是全部2
clean.code_type
应该全部为P
clean.date
应该全部为08/15/2016
dfs本身显示数据,只有在我连接丢失信息时才会显示数据。我认为它可能与索引有关,但我不确定。也可能与我有一个同时包含str
和int
的列(请参阅上面的clean.code
)这一事实有关,这可能就是我收到下面列出的运行时错误的原因。
// anaconda / lib / python3.5 / site-packages / pandas / indexes / api.py:71:RuntimeWarning:unorderable types:int()< str(),对于无法比较的对象,未定义排序顺序 result = result.union(other)
答案 0 :(得分:0)
从这开始:
code_name media_type acq. revenue
0 RASH Radio 50.0 34004.0
1 100 NaN 10.0 1035.0
2 NEWS SiriusXM 61.0 3475.0
3 DR SiriusXM 53.0 4307.0
4 SPORTS SiriusXM 45.0 6503.0
5 DOUBL Podcast 13.0 4205.0
试试这个:
df['client_id'] = 2
df['date'] = '08/15/2016'
df['code_type'] = 'P'
df
code_name media_type acq. revenue client_id date code_type
0 RASH Radio 50.0 34004.0 2 08/15/2016 P
1 100 NaN 10.0 1035.0 2 08/15/2016 P
2 NEWS SiriusXM 61.0 3475.0 2 08/15/2016 P
3 DR SiriusXM 53.0 4307.0 2 08/15/2016 P
4 SPORTS SiriusXM 45.0 6503.0 2 08/15/2016 P
5 DOUBL Podcast 13.0 4205.0 2 08/15/2016 P