data1={'Policy Number':['FSH1235456','FSH7643643','CHH123124','CHH123145252','CHH124124'],'State':['FL','TX','GA','TX','TX'],'TERR':[1,2,3,4,5]}
data2={'TERR':[1,2,3,4,5],'CHH':[0,.15,.65,.35,.20],'FSH':[0,.15,.25,.35,.20]}
output={'Policy Number':['FSH1235456','FSH7643643','CHH123124','CHH123145252','CHH124124'],'State':['FL','TX','GA','TX','TX'],'TERR':[1,2,3,4,5],'Test':[0,.15,0,0,0]}
df1=pd.DataFrame(data1)
df2=pd.DataFrame(data2)
df3=pd.DataFrame(output)
上面的测试数据。
我正在尝试在df1中创建一个名为df1 ['Test']的新列,其中包含基于以下条件的df2 ['FSH']值:
查看df3以获取正确的输出。
我尝试做的是以下事情;
if df1.State.any()=="TX":
if df1["Policy Number"].str.contains("FSH").any():
for i in df["TERR"]:
df1['% TERR']=df2.loc[[i],["FSH"]]
但是,我的输出充满了NAN和1个唯一的正确答案。
我尝试检查以确保通过
将正确的i值输入到df2中。print(df2.loc[[i],["FSH"]]
它正在正确打印。
有什么想法吗?
答案 0 :(得分:1)
我不知道这是最好还是最快的解决方案,但是一个选择是合并两个数据框,然后根据您的条件进行过滤,然后更新:
new = df1.merge(df2, on='TERR')
mask = new[((new['State']=='TX') & (new['Policy Number'].str.contains('FSH')))]
df1['Test'] = 0
df1['Test'].update(mask['FSH'])
Policy Number State TERR Test
0 FSH1235456 FL 1 0.00
1 FSH7643643 TX 2 0.15
2 CHH123124 GA 3 0.00
3 CHH123145252 TX 4 0.00
4 CHH124124 TX 5 0.00
答案 1 :(得分:1)
您可以在通过条件的地方使用numpy,
cond1 = (df1['State'] == 'TX')
cond2 = (df1['Policy Number'].str.contains('FSH'))
cond3 = (df1["TERR"] == df2['TERR'])
df1['Test'] = np.where(cond1 & cond2 & cond3, df2['FSH'], 0)
Policy Number State TERR Test
0 FSH1235456 FL 1 0.00
1 FSH7643643 TX 2 0.15
2 CHH123124 GA 3 0.00
3 CHH123145252 TX 4 0.00
4 CHH124124 TX 5 0.00
答案 2 :(得分:0)
您是否只是想将数据从df2
到df1
中?如果是这样,您可以使用df2
重塑melt
的形状,然后执行merge
。
df1['policy_prefix'] = df1['Policy Number'].str[:3]
df2 = df2.melt(id_vars='TERR', value_vars=['CHH', 'FSH'],
value_name='Test',
var_name='policy_prefix')
df1 = df1.merge(df2, on=['policy_prefix', 'TERR'])
如果仅希望将其应用于状态为“ TX”的行,则可以在合并后将其他值设置为null:
import numpy as np
df1.loc[df1.State!='TX', 'Test'] = np.nan
答案 3 :(得分:0)
这是您的解决方案:
# ... initialize df1 and df2 here
df3 = df1.join(df2.FSH) # Merge df1 and df2 into a single dataframe
df3 = df3.rename({"FSH": "TEST"}, axis=1) # Change column name
def set_tx_fsh(row):
if row.State == "TX" and "FSH" in row["Policy Number"]:
return row.TEST
else:
return 0
df3.TEST = df3.apply(set_tx_fsh, axis=1) # Set values in "TEST" column based on your condition