我有一个pandas数据框,其列范围和字符串类似于此:
STREET LOWADD HIGHADD POSTAL SECTOR
0 ABBERLY CIR 1900 2000 23112 A6
1 ABBEY VILLAGE CIR 500 600 23114 B6
我需要将其扩展/转换为LOWADD和HIGHADD列之间的内容,并向前填充STREET,POSTAL和SECTOR中的数据:
New_Street POSTAL SECTOR
1901 ABBERLY CIR 23112 A6
1902 ABBERLY CIR 23112 A6
1903 ABBERLY CIR 23112 A6
1904 ABBERLY CIR 23112 A6
1905 ABBERLY CIR 23112 A6
用熊猫做这件事的最好方法是什么?
答案 0 :(得分:2)
想法是按Series.sub
减去重复行数的列,然后按Index.repeat
和DataFrame.loc
重复,最后将GroupBy.cumcount
的计数器系列添加到Street
列:
df = df.reset_index(drop=True)
diff = df['HIGHADD'].sub(df['LOWADD'])
df = df.loc[df.index.repeat(diff)]
s = df.groupby(level=0).cumcount().add(1).add(df['LOWADD']).astype(str)
df['STREET'] = s + ' ' + df['STREET']
df = df.drop(['LOWADD','HIGHADD'], axis=1).reset_index(drop=True)
print (df)
STREET POSTAL SECTOR
0 1901 ABBERLY CIR 23112 A6
1 1902 ABBERLY CIR 23112 A6
2 1903 ABBERLY CIR 23112 A6
3 1904 ABBERLY CIR 23112 A6
4 1905 ABBERLY CIR 23112 A6
.. ... ... ...
195 596 ABBEY VILLAGE CIR 23114 B6
196 597 ABBEY VILLAGE CIR 23114 B6
197 598 ABBEY VILLAGE CIR 23114 B6
198 599 ABBEY VILLAGE CIR 23114 B6
199 600 ABBEY VILLAGE CIR 23114 B6
[200 rows x 3 columns]