对于以下输入数据,我想用逗号将列office_number
分成多行:
df = pd.DataFrame({'id':['1010084420','1010084420','1010084420','1010084421','1010084421','1010084421','1010084425'],
'building_name': ['A', 'A', 'A', 'East Tower', 'East Tower', 'West Tower', 'T1'],
'floor': ['1', '1', '2', '10', '10', '11','11'],
'office_number':['101-105', '106', '201-203, 205, 208', '1001-1005', '1006, 1008, 1010', '1101-1103', '1101-1105'],
'company_name': ['Ariel Resources Ltd.', 'A.O. Tatneft', '', 'Agrium Inc.', 'Creo Products Inc.', 'Cott Corp.', 'Creo Products Inc.']})
这是我从here参考的解决方案:
res = (df.set_index(['id', 'building_name', 'floor', 'company_name'])
.stack()
.str.split(',', expand=True)
.stack()
.unstack(-2)
.reset_index(-1, drop=True)
.reset_index())
result = res[['id', 'building_name', 'floor', 'office_number', 'company_name']]
print(result)
输出:
id building_name floor office_number company_name
0 1010084420 A 1 106 A.O. Tatneft
1 1010084420 A 1 101-105 Ariel Resources Ltd.
2 1010084420 A 2 201-203
3 1010084420 A 2 205
4 1010084420 A 2 208
5 1010084421 East Tower 10 1001-1005 Agrium Inc.
6 1010084421 East Tower 10 1006 Creo Products Inc.
7 1010084421 East Tower 10 1008 Creo Products Inc.
8 1010084421 East Tower 10 1010 Creo Products Inc.
9 1010084421 West Tower 11 1101-1103 Cott Corp.
10 1010084425 T1 11 1101-1105 Creo Products Inc.
如果您还有其他解决方案,欢迎分享。谢谢。
答案 0 :(得分:2)
另一种解决方案是将split
和DataFrame.pop
的DataFrame.join
,stack
,Series
列提取为原始内容:
s = (df.pop('office_number')
.str.split(',', expand=True)
.stack()
.reset_index(1, drop=True)
.rename('office_number'))
res = df.join(s).reset_index(drop=True)
result = res[['id', 'building_name', 'floor', 'office_number', 'company_name']]
print(result)
id building_name floor office_number company_name
0 1010084420 A 1 101-105 Ariel Resources Ltd.
1 1010084420 A 1 106 A.O. Tatneft
2 1010084420 A 2 201-203
3 1010084420 A 2 205
4 1010084420 A 2 208
5 1010084421 East Tower 10 1001-1005 Agrium Inc.
6 1010084421 East Tower 10 1006 Creo Products Inc.
7 1010084421 East Tower 10 1008 Creo Products Inc.
8 1010084421 East Tower 10 1010 Creo Products Inc.
9 1010084421 West Tower 11 1101-1103 Cott Corp.
10 1010084425 T1 11 1101-1105 Creo Products Inc.