Question

我正在做“什么在做饭？” Kaggle挑战，我有一个包含三列的数据框（请注意，最后一个包含一个列表）：

    cuisine id  ingredients
    0   greek   10259   [romaine lettuce, black olives, grape tomatoes...
    1   southern_us 25693   [plain flour, ground pepper, salt, tomatoes, g...
    2   filipino    20130   [eggs, pepper, salt, mayonaise, cooking oil, g...

我想计算每种菜肴中每种成分的使用频率。为了做到这一点，我拆分了最后一列，并将其用作新数据集中的一行。另外，我将美食添加到了拆分行的每个元素中。因此，如果有一行：

cuisine_1    id   [ingredient_1, ingredient_2, ingredient_3]

我想要获得的预期输出是：

cuisine_1    id   ingredient_1
cuisine_1    id   ingredient_2
cuisine_1    id   ingredient_3

我编写了以下代码，这些代码可以正常工作并将数据集转换为预期的格式：

ingredients = []
cuisines = []
ids = []
for _, row in train.iterrows():
    cuisine = row.cuisine
    identifier = row.id
    for ingredient in row.ingredients:
        cuisines.append(cuisine)
        ingredients.append(ingredient)
        ids.append(identifier)

ingredient_to_cuisine = pd.DataFrame({
    "id": ids,
    "ingredient": ingredients,
    "cuisine": cuisines
})

它可以完成工作，但这是很多代码。另外，我必须剖析数据集并重建它。它看起来不像熊猫码。我觉得我正在重新发明轮子。

我的问题是：我是否应该使用任何内置的Pandas函数以更“熊猫友好的方式”获得此类结果？

在熊猫中将列表分成几行

0 个答案: