我有两个数据集
df1 = pd.DataFrame ({"skuid" :("A","B","C","D"), "price": (0,0,0,0)})
df2 = pd.DataFrame ({"skuid" :("A","B","C","D"),"salesprice" :(10,0,0,30),"regularprice" : (9,10,0,2)})
我想在条件中插入销售价格和正常价格: 如果df1 skuid和df2 skuid匹配并且df2 salesprice不为零,请使用salesprice作为价格值。如果sku的匹配项和df2 salesprice为零,则使用Regularprice。如果不使用零作为价格值。
def pric(df1,df2):
if (df1['skuid'] == df2['skuid'] and salesprice !=0):
price = salesprice
elif (df1['skuid'] == df2['skuid'] and regularprice !=0):
price = regularprice
else:
price = 0
我在类似的条件下创建了一个函数,但是它不起作用。结果应类似于df1
skuid price
A 10
B 10
C 0
D 30
谢谢。
答案 0 :(得分:1)
因此,上述功能存在许多问题。以下是一些不分先后的顺序:
这是您的函数的一个版本,为解决上述特定问题而进行了或多或少的更改
import pandas as pd
df1 = pd.DataFrame({"skuid" :("A","B","C","D"), "price": (0,0,0,0)})
df2 = pd.DataFrame({"skuid" :("A","B","C","D"),"salesprice" :(10,0,0,30),"regularprice" : (9,10,0,2)})
def pric(df1, df2, id_colname,df1_price_colname, df2_salesprice_colname,df2_regularprice_colname):
for i in range(df1.shape[0]):
for j in range(df2.shape[0]):
if (df1.loc[df1.index[i],id_colname] == df2.loc[df2.index[j],id_colname] and df2.loc[df2.index[j],df2_salesprice_colname] != 0):
df1.loc[df1.index[i],df1_price_colname] = df2.loc[df2.index[j],df2_salesprice_colname]
break
elif (df1.loc[df1.index[i],id_colname] == df2.loc[df2.index[j],id_colname] and df2.loc[df2.index[j],df2_regularprice_colname] != 0):
df1.loc[df1.index[i],df1_price_colname] = df2.loc[df2.index[j],df2_regularprice_colname]
break
return df1
要输入的内容
df1_imputed=pric(df1,df2,'skuid','price','salesprice','regularprice')
print(df1_imputed['price'])
给予
0 10
1 10
2 0
3 30
Name: price, dtype: int64
在检查由行索引/列对指定的特定元素的相等条件之前,请注意该函数如何遍历行索引。
需要考虑的几件事: