我的数据框如下:
df = pd.DataFrame({'User':['101','102','103','104'],
'Text':["""{"y":["8","8 plus"]""","""{"x":["7"]}""","""{"x":["7","7+","7++"]}""","""{"x":["7"]}"""]})
想要的输出:
我已经尝试过只提取那些精确值的方法
df2 = df.set_index('User').Text .str.split(',', expand=True).stack().reset_index()
答案 0 :(得分:1)
可能是这样:
import ast
df[['Text1','Text2']]=df.pop('Text').str.split(":",expand=True)
df.Text2=df.Text2.replace("}","",regex=True).apply(ast.literal_eval)
df.Text1=df.Text1.replace("\W",'',regex=True)
s=pd.DataFrame({'B':np.concatenate(df.Text2.values)},index=df.index.repeat(df.Text2.str.len()))
df.join(s).drop('Text2',1).rename(columns={'B':'Text2'})
输出
User Text1 Text2
0 101 y 8
0 101 y 8 plus
1 102 x 7
2 103 x 7
2 103 x 7+
2 103 x 7++
3 104 x 7
答案 1 :(得分:1)
假设第一个字典缺少右括号(}
)。您可以使用ast.literal_eval:
import ast
import pandas as pd
df = pd.DataFrame({'User': ['101', '102', '103', '104'],
'Text': ["""{"y":["8","8 plus"]}""", """{"x":["7"]}""", """{"x":["7","7+","7++"]}""",
"""{"x":["7"]}"""]})
# convert to dictionary and drop the text column
dictionaries = df.assign(D=df.Text.apply(ast.literal_eval)).drop('Text', axis=1)
# convert each row to multiple ones (given by the values of each dictionary)
tuples = [(u, k, v) for u, r in dictionaries.values for k, vs in r.items() for v in vs]
result = pd.DataFrame(tuples, columns=['User', 'Text1', 'Text2'])
print(result)
输出
User Text1 Text2
0 101 y 8
1 101 y 8 plus
2 102 x 7
3 103 x 7
4 103 x 7+
5 103 x 7++
6 104 x 7