我们可以基于文本匹配合并两个数据框吗?

时间:2020-01-28 03:30:02

标签: python python-3.x pandas

我有一个看起来像这样的DF:

    ID  Rating  Bin Price
0   864890  AA+ 4   97.14
1   691634  AA+ 4   14.21
2   792845  AA+ 3   101.25
3   506251  SP  3   100.31
4   689977  AA+ 3   97.37
    ... ... ... ...
249995  873393  AA+ 5   110.42
249996  495709  AA+ 7   105.47
249997  508123  AA+ 7   104.55
249998  650062  AA+ 8   105.37
249999  17658   AA+ 8   103.53

我还有另一个看起来像这样的DF。

    Rating  RatingScores
0   AAA 10
1   AA+ 9.5
2   AA  9
3   A+  8.5
4   A   8
..  ... ...
20  CC- 0
21  D   0
22  NA  0
23  NR  0
24  SP  0

我想知道是否有一种方法可以将第二个DF合并到第一个DF中。索引不匹配,但是“评级”字段在两个字段中相同。或者,是否有一种简单的方法可以将列添加到第一个DF,并为非数字标签(Rating)生成数字结果(RatingScores)?

到目前为止,我已经尝试过:

dataset['RatingScores'] = pd.merge(dataset, finalDF, on='Rating')

我现在得到这个。

dataset['RatingScores'] = pd.merge(dataset, finalDF, on='Rating')
Traceback (most recent call last):

  File "<ipython-input-327-0afd66ad6da1>", line 1, in <module>
    dataset['RatingScores'] = pd.merge(dataset, finalDF, on='Rating')

  File "C:\Users\rshuell\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 3472, in __setitem__
    self._set_item(key, value)

  File "C:\Users\rshuell\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 3550, in _set_item
    NDFrame._set_item(self, key, value)

  File "C:\Users\rshuell\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py", line 3381, in _set_item
    self._data.set(key, value)

  File "C:\Users\rshuell\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 1072, in set
    self.insert(len(self.items), item, value)

  File "C:\Users\rshuell\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 1181, in insert
    block = make_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1))

  File "C:\Users\rshuell\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\blocks.py", line 3267, in make_block
    return klass(values, ndim=ndim, placement=placement)

  File "C:\Users\rshuell\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\blocks.py", line 2775, in __init__
    super().__init__(values, ndim=ndim, placement=placement)

  File "C:\Users\rshuell\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\blocks.py", line 128, in __init__
    "{mgr}".format(val=len(self.values), mgr=len(self.mgr_locs))

ValueError: Wrong number of items passed 12, placement implies 1

所以...

(dataset.to_dict())

  997: 5,
  998: 4,
  999: 5,
  1000: 3,
  1001: 3,
  1002: 4,
  1003: 5,
  1004: 5,
  1005: 1,
  1006: 3,
  1007: 4,
  1008: 4,
  1009: 1,
  1010: 9,
  1011: 1,
  1012: 6,
  1013: 1,
  1014: 2,
  1015: 3,
  1016: 1,
  1017: 10,
  1018: 1,
  1019: 3,
  ...}}

还...

(finalDF.to_dict())

{'Rating': {0: 'AAA',
  1: 'AA+',
  2: 'AA',
  3: 'A+',
  4: 'A',
  5: 'A-',
  6: 'AA-',
  7: 'BBB',
  8: 'BB+',
  9: 'BB',
  10: 'B+',
  11: 'B',
  12: 'B-',
  13: 'BB-',
  14: 'CCC',
  15: 'CC+',
  16: 'CC',
  17: 'C+',
  18: 'C',
  19: 'C-',
  20: 'CC-',
  21: 'D',
  22: 'NA',
  23: 'NR',
  24: 'SP'},
 'RatingScores': {0: 10.0,
  1: 9.5,
  2: 9.0,
  3: 8.5,
  4: 8.0,
  5: 7.5,
  6: 7.0,
  7: 6.5,
  8: 6.0,
  9: 5.5,
  10: 5.0,
  11: 4.5,
  12: 4.0,
  13: 3.5,
  14: 3.0,
  15: 2.5,
  16: 2.0,
  17: 1.5,
  18: 1.0,
  19: 0.0,
  20: 0.0,
  21: 0.0,
  22: 0.0,
  23: 0.0,
  24: 0.0}}

2 个答案:

答案 0 :(得分:2)

您的代码已经正确,但是pd.merge()返回一个新加入的DataFrame,您将其放入了pd.Series,因此是ValueError

因此,您可以做的是将其放入新变量中,以容纳合并的DataFrame

merged_df = pd.merge(dataset, finalDF, on='Rating')

答案 1 :(得分:0)

我将研究隔离第二个数据框中所需的列,然后使用“ join”组合两个数据框。

dat1.join(dat2)