多个数据帧的Pandas连接返回空值

时间:2016-08-26 18:24:41

标签: python pandas concatenation

我有一个数据框(df),我将其划分为4个新的dfs(mediaclientcode_typedate)。 media有一列空值,而其他三列只有1-dim dfs,每个都由空值组成。在替换每个数据帧中的空值后,我尝试pd.concat得到一个df并得到下面的结果。

 code_type
0   P
1   P
2   P
3   P
4   P
5   P

code_name   media_type  acq.    revenue
0   RASH    NaN         50.0     34004.0
1   100     NaN         10.0     1035.0
2   NEWS    NaN         61.0     3475.0
3   DR      NaN         53.0     4307.0
4   SPORTS  NaN         45.0     6503.0
5   DOUBL   NaN         13.0     4205.0

    client_id
0   2.0
1   2.0
2   2.0
3   2.0
4   2.0
5   2.0

    date
0   2016-08-15
1   2016-08-15
2   2016-08-15
3   2016-08-15
4   2016-08-15
5   2016-08-15

pd.merge media使用另一个单独的df替换media.media_type下的NaN,后者添加了新的media_type_y

code_name   media_type_x    acq.    revenue  media_type_y
0   RASH       NaN          282     34004.0  Radio
1   100        NaN          119     1035.0   NaN
2   NEWS       NaN           81     3475.0   SiriusXM
3   DR         NaN           33     4307.0   SiriusXM
4   SPORTS     NaN           25     6503.0   SiriusXM
5   DOUBL      NaN           23     4205.0   Podcast

然后我放弃media_type_x并将media_type_y重命名为media_type

final = m.loc[:,('code_name','media_type_y', 'acquisition', 'revenue')]
final = final.rename(columns={'media_type_y': 'media_type'})

因此,当我连接时,我有一个完整的df。

clean = pd.concat([media, client, code_type, date], axis=1)  

    code    media       acq.    revenue   client code_type  date
0   RASH    Radio       50.0    34004.0     NaN     NaN     NaT
1   100     NaN         10.0    1035.0      NaN     NaN     NaT
2   NEWS    SiriusXM    61.0    3475.0      NaN     NaN     NaT
3   DR      SiriusXM    53.0    4307.0      NaN     NaN     NaT
4   SPORTS  SiriusXM    45.0    6503.0      NaN     NaN     NaT
5   DOUBL   Podcast     13.0    4205.0      NaN     NaN     NaT


clean.client应该是全部2 clean.code_type应该全部为P clean.date应该全部为08/15/2016

dfs本身显示数据,只有在我连接丢失信息时才会显示数据。我认为它可能与索引有关,但我不确定。也可能与我有一个同时包含strint的列(请参阅上面的clean.code)这一事实有关,这可能就是我收到下面列出的运行时错误的原因。

  

// anaconda / lib / python3.5 / site-packages / pandas / indexes / api.py:71:RuntimeWarning:unorderable types:int()< str(),对于无法比较的对象,未定义排序顺序     result = result.union(other)

1 个答案:

答案 0 :(得分:0)

从这开始:

  code_name media_type  acq.  revenue
0      RASH      Radio  50.0  34004.0
1       100        NaN  10.0   1035.0
2      NEWS   SiriusXM  61.0   3475.0
3        DR   SiriusXM  53.0   4307.0
4    SPORTS   SiriusXM  45.0   6503.0
5     DOUBL    Podcast  13.0   4205.0

试试这个:

df['client_id'] = 2
df['date']      = '08/15/2016'
df['code_type'] = 'P'
df

    code_name media_type  acq.  revenue  client_id        date code_type
0      RASH      Radio  50.0  34004.0          2  08/15/2016         P
1       100        NaN  10.0   1035.0          2  08/15/2016         P
2      NEWS   SiriusXM  61.0   3475.0          2  08/15/2016         P
3        DR   SiriusXM  53.0   4307.0          2  08/15/2016         P
4    SPORTS   SiriusXM  45.0   6503.0          2  08/15/2016         P
5     DOUBL    Podcast  13.0   4205.0          2  08/15/2016         P