Python - 如何遍历数据框并将一个单元格中的值替换为同一行中另一个单元格中的值

时间:2018-03-07 13:33:11

标签: python pandas

我试图在食物成分的数据框架中创建一个新列,每行具有唯一值,基于来自同一行中其他单元格的信息。

该表基本上如下所示:

ingredient_name | ingredient_method | consolidated_name
Cheese          | [camembert, pkg]  | 
Cheese          | [cream, pastueri] |
Egg             | [raw, scrambled]  |

我尝试遍历各行,并在consolidated_name列填充ingredient_nameingredient_method的值。
例如,如果ingredient_name是"奶酪"我希望该行的合并名称成为ingredient_method中列表的第一个元素。

这是我到目前为止的代码:

for i, row in df.iterrows():
    consolidated = df['ingredient_name']
    if (df['ingredient_name'] == 'Cheese').all():
        consolidated = df['ingredient_method'][0]
    df.set_value(i,'consolidated_name',consolidated)

代码运行没有错误,但数据框中没有任何值发生变化 有什么想法吗?

3 个答案:

答案 0 :(得分:2)

可以使用.loc(合并到.str[0]

使用:

df = pd.DataFrame(dict(ingredient_name=['Cheese','Cheese','Egg'],
                  ingredient_method=[['camembert', 'pkg'],
                                     ['cream', 'pastueri'],
                                     ['raw', 'scrambled']]))

做:

#Initialize consolidated_name with None for instance
df['consolidated_name'] = [None]*len(df) #Not mandatory, will fill with NaN if not set

#Use .loc to get the rows you want and .str[0] to get the first elements
_filter = df.ingredient_name=='Cheese' #Filter you want to
df.loc[_filter,'consolidated_name'] = df.loc[_filter,'ingredient_method'].str[0]

结果:

print(df)
   ingredient_method ingredient_name consolidated_name
0   [camembert, pkg]          Cheese         camembert
1  [cream, pastueri]          Cheese             cream
2   [raw, scrambled]             Egg              None

注意

<强>#1
如果您想合并所有重复的成分,可以使用以下内容进行过滤:

_duplicated = df.ingredient_name[df.ingredient_name.duplicated()]
_filter = df.ingredient_name.isin(_duplicated)

.loc的使用不变,请参阅下一个示例:

df = pd.DataFrame(dict(ingredient_name=['Cheese','Cheese','Egg','Foo','Foo'],
                  ingredient_method=[['camembert', 'pkg'], 
                                     ['cream', 'pastueri'], 
                                     ['raw', 'scrambled'], 
                                     ['bar', 'taz'], 
                                     ['taz', 'bar']]))

_duplicated = df.ingredient_name[df.ingredient_name.duplicated()]
_filter = df.ingredient_name.isin(_duplicated)
df.loc[_filter,'consolidated_name'] = df.loc[_filter,'ingredient_method'].str[0]
print(df)

   ingredient_method ingredient_name consolidated_name
0   [camembert, pkg]          Cheese         camembert
1  [cream, pastueri]          Cheese             cream
2   [raw, scrambled]             Egg               NaN
3         [bar, taz]             Foo               bar
4         [taz, bar]             Foo               taz

<强>#2
如果您愿意,可以使用ingredient_name初始化:

df['consolidated_name'] = df.ingredient_name

然后做你的东西:

_duplicated = df.ingredient_name[df.ingredient_name.duplicated()]
_filter = df.ingredient_name.isin(_duplicated)
df.loc[_filter,'consolidated_name'] = df.loc[_filter,'ingredient_method'].str[0]
print(df)

   ingredient_method ingredient_name consolidated_name
0   [camembert, pkg]          Cheese         camembert
1  [cream, pastueri]          Cheese             cream
2   [raw, scrambled]             Egg               Egg #Here it has changed
3         [bar, taz]             Foo               bar
4         [taz, bar]             Foo               taz

答案 1 :(得分:0)

您可以将DataFrame.apply用于此目的。只需将您的决策逻辑(现在位于for循环中)包装到相应的函数中。

def func(row):
    if row['ingredient_name'] == 'Cheese':
        return row['ingredient_method'][0]
    return None

df['consolidated_name'] = df.apply(func, axis=1)

答案 2 :(得分:0)

如果你想用你的初始循环来做。

consolidated_name = []
for i,row in df.iterrows():
    if row[0] =='Cheese':
        consolidated_name.append(row[1][0])
    else: consolidated_name.append(None)

df['consolidated_name']=consolidated_name

## out:
  ingredient_name  ingredient_method consolidated_name
0          Cheese   [camembert, pkg]         camembert
1          Cheese  [cream, pastueri]             cream
2             Egg   [raw, scrambled]              None