我有一个这样的熊猫数据框:
df = userid accessto
432 pc,internet,wifi
233 pc
235 wifi,laptoop,mobile
236 wifi,laptoop,mobile,pc
我想要这样:
userid device
432 pc
432 internet
432 wifi
233 pc
235 wifi
235 laptop
235 mobile
到目前为止,我所做的是这样的:
import pandas as pd
s = df.accessto
s = s.str.split(',', expand = True)
s
如何用原始df找回's'?
答案 0 :(得分:2)
使用:
s = (df.pop('accessto').str.split(',', expand = True)
.stack()
.reset_index(level=1, drop=True)
.rename('device'))
print (s)
0 pc
0 internet
0 wifi
1 pc
2 wifi
2 laptoop
2 mobile
3 wifi
3 laptoop
3 mobile
3 pc
Name: device, dtype: object
df = df.join(s).reset_index(drop=True)
print (df)
userid device
0 432 pc
1 432 internet
2 432 wifi
3 233 pc
4 235 wifi
5 235 laptoop
6 235 mobile
7 236 wifi
8 236 laptoop
9 236 mobile
10 236 pc
说明:
pop
列与expand = True
到DataFrame
的{{3}}列split
重塑stack
用于删除第一级MultiIndex
rename
Series
通过新列名reset_index
到原始df 或者:
from itertools import chain
s = df['accessto'].str.split(',')
df = pd.DataFrame({
'userid' : df['userid'].repeat(s.str.len()).values,
'accessto' : list(chain.from_iterable(s.values))
})
print (df)
userid accessto
0 432 pc
1 432 internet
2 432 wifi
3 233 pc
4 235 wifi
5 235 laptoop
6 235 mobile
7 236 wifi
8 236 laptoop
9 236 mobile
10 236 pc