我有一个看起来像这样的数据框:
print(df_master)
Authors | Codes | ID | Year
[{first_name: 'fn1', | 11111 | id0001 | 2019
last_name: 'ln1'},
{first_name: 'fn2',
last_name: 'ln2'}]
[{first_name: 'fn3', | 22222 | id0002 | 2019
last_name: 'ln3'}]
我想从Authors列创建一个新的Dataframe,如下所示:
print(df_authors)
First Name | Last Name | Codes | ID | Year
'fn1' | 'ln1' | 11111 | id0001 | 2019
'fn2' | 'ln2' | 11111 | id0001 | 2019
'fn3' | 'ln3' | 22222 | id0002 | 2020
目前,我什至无法从单元格访问字典。我尝试过:
df_dim['Authors'].apply(pd.Series)
但是我得到了相同的专栏。 *我认为问题在于字典是作为字符串存储的。
答案 0 :(得分:0)
使用ast.literal_eval
以列表理解的方式创建新字典,以将字符串repr转换为字典,将join
转换为原始DataFrame:
import ast
df1 = (pd.DataFrame([{**y, **{'i':k}}
for k, v in df.pop('Authors').items()
for y in ast.literal_eval(v)]).set_index('i'))
print (df1)
first_name last_name
i
0 fn1 ln1
0 fn2 ln2
1 fn3 ln3
df = df.join(df1).reset_index(drop=True)
print (df)
Codes ID Year first_name last_name
0 11111 id0001 2019 fn1 ln1
1 11111 id0001 2019 fn2 ln2
2 22222 id0002 2019 fn3 ln3
答案 1 :(得分:0)
# Return first_names and last_names
def get_first_last_names(authors):
first_names = []
last_names = []
for a in authors:
first_names.append(a['first_name'])
last_names.append(a['last_name'])
return first_names, last_names
first_names, last_names = get_first_last_names(df['Authors'].values)
# Add new columns and set values
df['first_name'] = first_names
df['last_name'] = last_names
# Drop Authors column
df.drop(columns=['Authors'], axis=1, inplace=True)