我有一个数据框,其中一列有一个如下所示的值列表。
ID Source
1 [apple,mango]
2 [grapes]
现在我正在尝试消除数据框中的值列表并创建一个新的数据框,如下所示:
ID Source
1 apple
1 mango
2 grapes
我正在尝试使用以下代码实现上述目标:
duplicates = pd.DataFrame()
for _, row in file_df.iterrows(): # file_df is the original dataframe with list of values
leng = len(row.Source_sentences)
for j in row.Source_sentences:
itr = [row.ID,j]
df2 = pd.DataFrame(row.ID, j, columns=["ID","Source"])
print(itr)
duplicates.append(df2,ignore_index=True)
idx = idx +1
print(duplicates)
我收到以下错误
TypeError: Index(...) must be called with a collection of some kind, 'Apple.' wa
s passed
有人能指出我错误的代码
答案 0 :(得分:0)
您可以尝试这种方式:
输入:
df = {'ID': [1, 2],'Source': [["apple","mango"], ["grapes"]]}
df = pd.DataFrame(data=df)
ID Source
0 1 [apple, mango]
1 2 [grapes]
df1 = (df['Source'].apply(lambda x: pd.Series(x))
.stack()
.reset_index(level=1, drop=True)
.to_frame('Source')
.join(df[['ID']], how='left')
)
输出:
Source ID
0 apple 1
0 mango 1
1 grapes 2