我试图在食物成分的数据框架中创建一个新列,每行具有唯一值,基于来自同一行中其他单元格的信息。
该表基本上如下所示:
ingredient_name | ingredient_method | consolidated_name
Cheese | [camembert, pkg] |
Cheese | [cream, pastueri] |
Egg | [raw, scrambled] |
我尝试遍历各行,并在consolidated_name
列填充ingredient_name
或ingredient_method
的值。
例如,如果ingredient_name
是"奶酪"我希望该行的合并名称成为ingredient_method
中列表的第一个元素。
这是我到目前为止的代码:
for i, row in df.iterrows():
consolidated = df['ingredient_name']
if (df['ingredient_name'] == 'Cheese').all():
consolidated = df['ingredient_method'][0]
df.set_value(i,'consolidated_name',consolidated)
代码运行没有错误,但数据框中没有任何值发生变化 有什么想法吗?
答案 0 :(得分:2)
可以使用.loc
(合并到.str[0]
)
使用:
df = pd.DataFrame(dict(ingredient_name=['Cheese','Cheese','Egg'],
ingredient_method=[['camembert', 'pkg'],
['cream', 'pastueri'],
['raw', 'scrambled']]))
做:
#Initialize consolidated_name with None for instance
df['consolidated_name'] = [None]*len(df) #Not mandatory, will fill with NaN if not set
#Use .loc to get the rows you want and .str[0] to get the first elements
_filter = df.ingredient_name=='Cheese' #Filter you want to
df.loc[_filter,'consolidated_name'] = df.loc[_filter,'ingredient_method'].str[0]
结果:
print(df)
ingredient_method ingredient_name consolidated_name
0 [camembert, pkg] Cheese camembert
1 [cream, pastueri] Cheese cream
2 [raw, scrambled] Egg None
注意强>
<强>#1 强>
如果您想合并所有重复的成分,可以使用以下内容进行过滤:
_duplicated = df.ingredient_name[df.ingredient_name.duplicated()]
_filter = df.ingredient_name.isin(_duplicated)
.loc
的使用不变,请参阅下一个示例:
df = pd.DataFrame(dict(ingredient_name=['Cheese','Cheese','Egg','Foo','Foo'],
ingredient_method=[['camembert', 'pkg'],
['cream', 'pastueri'],
['raw', 'scrambled'],
['bar', 'taz'],
['taz', 'bar']]))
_duplicated = df.ingredient_name[df.ingredient_name.duplicated()]
_filter = df.ingredient_name.isin(_duplicated)
df.loc[_filter,'consolidated_name'] = df.loc[_filter,'ingredient_method'].str[0]
print(df)
ingredient_method ingredient_name consolidated_name
0 [camembert, pkg] Cheese camembert
1 [cream, pastueri] Cheese cream
2 [raw, scrambled] Egg NaN
3 [bar, taz] Foo bar
4 [taz, bar] Foo taz
<强>#2 强>
如果您愿意,可以使用ingredient_name
初始化:
df['consolidated_name'] = df.ingredient_name
然后做你的东西:
_duplicated = df.ingredient_name[df.ingredient_name.duplicated()]
_filter = df.ingredient_name.isin(_duplicated)
df.loc[_filter,'consolidated_name'] = df.loc[_filter,'ingredient_method'].str[0]
print(df)
ingredient_method ingredient_name consolidated_name
0 [camembert, pkg] Cheese camembert
1 [cream, pastueri] Cheese cream
2 [raw, scrambled] Egg Egg #Here it has changed
3 [bar, taz] Foo bar
4 [taz, bar] Foo taz
答案 1 :(得分:0)
您可以将DataFrame.apply
用于此目的。只需将您的决策逻辑(现在位于for
循环中)包装到相应的函数中。
def func(row):
if row['ingredient_name'] == 'Cheese':
return row['ingredient_method'][0]
return None
df['consolidated_name'] = df.apply(func, axis=1)
答案 2 :(得分:0)
如果你想用你的初始循环来做。
consolidated_name = []
for i,row in df.iterrows():
if row[0] =='Cheese':
consolidated_name.append(row[1][0])
else: consolidated_name.append(None)
df['consolidated_name']=consolidated_name
## out:
ingredient_name ingredient_method consolidated_name
0 Cheese [camembert, pkg] camembert
1 Cheese [cream, pastueri] cream
2 Egg [raw, scrambled] None