以下问题的扩展: Split (explode) pandas dataframe string entry to separate rows
df:
STATE CITY ALT_NAMES
0 S1 C1 A1@A2
1 S2 C2 A3@A4@A5
如何获得以下结果:
out_df:
STATE CITY CITY_VAR
0 S1 C1 A1
1 S1 C1 A2
2 S2 C2 A3
3 S2 C2 A4
4 S2 C2 A5
样本数据:
STATE CITY ALT_NAMES
FL FT. MYERS FORT MYERS@FT MYERS
FL NORTH FT MYERS N.FT.MYERS@N. FORT MYERS@NORTH FORT MYERS
答案 0 :(得分:2)
为我工作:
df = explode(df.assign(ALT_NAMES=df.ALT_NAMES.str.split('@')), 'ALT_NAMES')
print (df)
STATE CITY ALT_NAMES
0 S1 C1 A1
1 S1 C1 A2
2 S2 C2 A3
3 S2 C2 A4
4 S2 C2 A5
另一种纯熊猫解决方案:
df = (df.join(df.pop('ALT_NAMES')
.str.split('@', expand=True)
.stack()
.reset_index(level=1, drop=True)
.rename('ALT_NAMES'))
.reset_index(drop=True ))
print (df)
STATE CITY ALT_NAMES
0 FL FT. MYERS FORT MYERS
1 FL FT. MYERS FT MYERS
2 FL NORTH FT MYERS N.FT.MYERS
3 FL NORTH FT MYERS N. FORT MYERS
4 FL NORTH FT MYERS NORTH FORT MYERS
答案 1 :(得分:1)
这是针对您的数据的优化版本。
from itertools import chain
v = df.pop('ALT_NAMES').str.split('@')
df = pd.DataFrame(
df.values.repeat(v.str.len(), axis=0), columns=df.columns)
df['ALT_NAMES'] = list(chain.from_iterable(v))
df
STATE CITY ALT_NAMES
0 S1 C1 A1
1 S1 C1 A2
2 S2 C2 A3
3 S2 C2 A4
4 S2 C2 A5