我的数据框有一个以逗号分隔值保存为一列的列。
from StringIO import StringIO
myst="""india | 905034 | 19:44 | cricket, hockey
USA | 905094 | 19:33 | swimming, running, tennis, football
Russia | 905154 | 21:56 | basketball
"""
u_cols=['country', 'index', 'current_tm', 'sports']
myf = StringIO(myst)
import pandas as pd
df = pd.read_csv(StringIO(myst), sep='|', names = u_cols)
是否有可能将细胞分成几行......
india cricket
india hockey
USA swimming
USA running
USA tennis
USA football
Russia basketball
答案 0 :(得分:2)
您可以使用str.split
,然后使用apply(pd.Series).stack()
(apply(pd.Series)
生成不同的元素列,stack
用于将其转换为行):
In [34]: df = df.set_index('country')
In [36]: s = df['sports'].str.split(',').apply(pd.Series).stack()
In [37]: s
Out[37]:
country
india 0 cricket
1 hockey
USA 0 swimming
1 running
2 tennis
3 football
Russia 0 basketball
dtype: object
然后进一步清理:
In [38]: s.reset_index(level=0).reset_index(drop=True)
Out[38]:
country 0
0 india cricket
1 india hockey
2 USA swimming
3 USA running
4 USA tennis
5 USA football
6 Russia basketball
注意,对于最近的pandas,您可以将.apply(pd.Series)
替换为str.split中的expand=True
:df['sports'].str.split(',', expand=True).stack()