熊猫-合并具有相似值的行(名称拼写差异)

时间:2020-08-24 04:09:02

标签: python pandas dataframe rows

我有以下Python Pandas Dataframe:


   Name        Sales Qty
0 JOHN BARNES   10
1 John Barnes    5
2 John barnes    4
3 Peter K.       4
4 Peter K        6
5 Peter Krammer  5
6 Charles        3
7 CHARLES        2
8 Julie Moore    3
9 Julie moore    7
10

And many more, with same name spelling variations.

我想将具有相似值的行合并,以使我具有以下数据框:

  Name           Sales Qty
0 John Barness   19
1 Peter Krammer  15
2 Charles         5
3 Julie Moore    10

and many more

我应该怎么办?

1 个答案:

答案 0 :(得分:0)

要求很模糊,正如您在评论中可以看到的那样,但是我已经列出了总数。我通过小写名称并删除句点来计算总数,然后使用str.title()将其转换为大写。

import pandas as pd
import io

data = '''
 Name Sales
0 "JOHN BARNES" 10
1 "John Barnes" 5
2 "John barnes" 4
3 "Peter K." 4
4 "Peter K" 6
5 "Peter Krammer" 5
6 "Charles"  3
7 "CHARLES"  2
8 "Julie Moore" 3
9 "Julie moore" 7
'''

df = pd.read_csv(io.StringIO(data), sep='\s+')
df['lower'] = df['Name'].str.lower()
df['lower'] = df['lower'].str.replace('.','')
new = df.groupby('lower')['Sales'].sum().reset_index()
new['lower'] = new['lower'].str.title()

new
    lower   Sales
0   Charles 5
1   John Barnes 19
2   Julie Moore 10
3   Peter K 10
4   Peter Krammer   5