我有一个DataFrame
,它在一行中包含字符串元组及其id。
像:
id words
223 [('flying bird','round place'),('blue sky','red rose')]
368 [('fairy tales','great day'),('show time','break free'),('noise free')]
我想:
id words
223 [('flying bird','round place')]
223 [('blue sky','red rose')]
368 ['fairy tales','great day')]
368 [('show time','break free')]
368 [('noise free')]]
在python pandas数据帧中。
答案 0 :(得分:1)
set_index
和stack
的另一种解决方案。最后一列words
转换为元组的list
,但如果元组只有一个元素,则需要添加,
:
df.set_index('id', inplace=True)
df = df.words.apply(pd.Series)
df = df.stack().reset_index(drop=True, level=1).reset_index(name='words')
df['words'] = df.words.apply(lambda x: [(x,)] if len(x) > 2 else [x] )
print (df)
id words
0 223 [(flying bird, round place)]
1 223 [(blue sky, red rose)]
2 368 [(fairy tales, great day)]
3 368 [(show time, break free)]
4 368 [(noise free,)]
答案 1 :(得分:0)
d = {'id': [233, 368],
'words': [[('flying bird','round place'),('blue sky','red rose')],
[('fairy tales','great day'),('show time','break free'),('noise free')]]}
df = pd.DataFrame(d)
dfidtemp = df['id']
df = df['words'].apply(pd.Series, 1)
df.index = dfidtemp
rslt = df.stack()
想知道这是否是你想要的:
rslt
Out[123]:
id
233 0 (flying bird, round place)
1 (blue sky, red rose)
368 0 (fairy tales, great day)
1 (show time, break free)
2 noise free
dtype: object
答案 2 :(得分:0)
words=[]
ids = []
for i in df.index:
words = words + df.words[i]
ids = ids + [df.id[i]]*len(df.words[i])
df = pd.DataFrame({'words':words,'ids':ids})
答案 3 :(得分:0)
您还可以使用ast
literal_evaluation
来解析tuples
中的strings
:
from ast import literal_eval as make_tuple
df = df.groupby('id')['words'].apply(lambda x: pd.Series(make_tuple(x.iloc[0])).apply(lambda x: [x] if isinstance(x, tuple) else [(x, )])).to_frame()
得到:
words
id
223 0 [(flying bird, round place)]
1 [(blue sky, red rose)]
368 0 [(fairy tales, great day)]
1 [(show time, break free)]
2 [(noise free,)]