我有以下Python Pandas Dataframe:
Name Sales Qty 0 JOHN BARNES 10 1 John Barnes 5 2 John barnes 4 3 Peter K. 4 4 Peter K 6 5 Peter Krammer 5 6 Charles 3 7 CHARLES 2 8 Julie Moore 3 9 Julie moore 7 10 And many more, with same name spelling variations.
我想将具有相似值的行合并,以使我具有以下数据框:
Name Sales Qty 0 John Barness 19 1 Peter Krammer 15 2 Charles 5 3 Julie Moore 10 and many more
我应该怎么办?
答案 0 :(得分:0)
要求很模糊,正如您在评论中可以看到的那样,但是我已经列出了总数。我通过小写名称并删除句点来计算总数,然后使用str.title()
将其转换为大写。
import pandas as pd
import io
data = '''
Name Sales
0 "JOHN BARNES" 10
1 "John Barnes" 5
2 "John barnes" 4
3 "Peter K." 4
4 "Peter K" 6
5 "Peter Krammer" 5
6 "Charles" 3
7 "CHARLES" 2
8 "Julie Moore" 3
9 "Julie moore" 7
'''
df = pd.read_csv(io.StringIO(data), sep='\s+')
df['lower'] = df['Name'].str.lower()
df['lower'] = df['lower'].str.replace('.','')
new = df.groupby('lower')['Sales'].sum().reset_index()
new['lower'] = new['lower'].str.title()
new
lower Sales
0 Charles 5
1 John Barnes 19
2 Julie Moore 10
3 Peter K 10
4 Peter Krammer 5