将经过处理的熊猫DataFrame一起添加

时间:2018-11-30 12:37:54

标签: python pandas dataframe add

我试图在Python中将两个DataFrame一起添加,首先将它们的索引列设置为等于现有列之一。

在以下线程中使用评分最高的方法会产生错误:

(请参阅-Adding two pandas dataframes

这是问题的一个简单示例:

import pandas as pd
import numpy as np

a = np.array([['A',1.,2.,3.],['B',1.,2.,3.],['C',1.,2.,3.]])
a = pd.DataFrame(a)
a = a.set_index(0)

a 

     1    2    3
0               
A  1.0  2.0  3.0
B  1.0  2.0  3.0
C  1.0  2.0  3.0

b = np.array([['A',1.,2.,3.],['B',1.,2.,3.]])
b = pd.DataFrame(b)
b.set_index(0)

b

     1    2    3
0               
A  1.0  2.0  3.0
B  1.0  2.0  3.0

df_add = a.add(b,fill_value=1)

错误:

Traceback (most recent call last):

  File "<ipython-input-150-885d92411f6c>", line 1, in <module>
    df_add = a.add(b,fill_value=1)

  File "/home/anaconda3/lib/python3.6/site-packages/pandas/core/ops.py", line 1234, in f
    return self._combine_frame(other, na_op, fill_value, level)

  File "/home/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 3490, in _combine_frame
    result = _arith_op(this.values, other.values)

  File "/home/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 3459, in _arith_op
    return func(left, right)

  File "/home/anaconda3/lib/python3.6/site-packages/pandas/core/ops.py", line 1195, in na_op
    result[mask] = op(xrav, yrav)

TypeError: must be str, not int

在防止此问题方面的任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:0)

问题在定义的DataFrame中-所有数据都转换为2d numpy数组中的字符串:

a = np.array([['A',1.,2.,3.],['B',1.,2.,3.],['C',1.,2.,3.]])
print (a)
[['A' '1.0' '2.0' '3.0']
 ['B' '1.0' '2.0' '3.0']
 ['C' '1.0' '2.0' '3.0']]

解决方案是删除字符串值并按列表指定索引:

a = np.array([[1.,2.,3.],[1.,2.,3.],[1.,2.,3.]])
a = pd.DataFrame(a, index=list('ABC'))

b = np.array([[1.,2.,3.],[1.,2.,3.]])
b = pd.DataFrame(b, index=list('AB'))

df_add = a.add(b,fill_value=1)
print (df_add)
     0    1    2
A  2.0  4.0  6.0
B  2.0  4.0  6.0
C  2.0  3.0  4.0

或在将索引设置为float s之后转换DataFrame:

a = np.array([['A',1.,2.,3.],['B',1.,2.,3.],['C',1.,2.,3.]])
a = pd.DataFrame(a)
a = a.set_index(0).astype(float)

b = np.array([['A',1.,2.,3.],['B',1.,2.,3.]])
b = pd.DataFrame(b)
b = b.set_index(0).astype(float)

df_add = a.add(b,fill_value=1)
print (df_add)
     1    2    3
0               
A  2.0  4.0  6.0
B  2.0  4.0  6.0
C  2.0  3.0  4.0