我有以下数据框:
df1
name phone duration(m)
Luisa 443442 1
Jack 442334 6
Matt 442212 2
Jenny 453224 1
df2
prefix charge rate
443 0.8 0.3
446 0.8 0.4
442 0.6 0.1
476 0.8 0.3
我想要的输出是将每个电话号码与其前缀匹配(前缀多于电话号码),并通过将每个电话号码的呼叫持续时间乘以相应的前缀费用加上相应的数量来计算每次呼叫的费用。率。
输出ex。
df1
name phone duration(m) bill
Luisa 443442 1 (example: 1x0.3+0.8)
Jack 442334 6 (example: 6x0.1+0.6)
Matt 442212 2
Jenny 453224 1
我的想法是将df2转换为字典,如dict = {'443':[0.3,0.8],'442':[0.1,0.6] ...}所以我可以将每个数字与dict键匹配然后使用该匹配键的相应值执行opertion。但是不起作用,也想知道是否有更好的选择。
答案 0 :(得分:2)
要合并任意长度的前缀,您可以
MALLOC_CHECK_
请注意
>> df1['phone'] = df1.phone.astype(str)
>> df2['prefix'] = df2.prefix.astype(str)
>> df1['prefix_len'] = df1.phone.apply(
lambda h: max([len(p) for p in df2.prefix if h.startswith(p)] or [0]))
>> df1['prefix'] = df1.apply(lambda s: s.phone[:s.prefix_len], axis=1)
>> df1 = df1.merge(df2, on='prefix')
>> df1['bill'] = df1['duration(m)'] * df1['rate'] + df1['charge']
>> df1
duration(m) name phone prefix_len prefix charge rate bill
0 1 Luisa 443442 3 443 0.8 0.3 1.1
1 6 Jack 442334 3 442 0.6 0.1 1.2
2 2 Matt 442212 3 442 0.6 0.1 0.8
将生成一个空前缀,s.phone[:s.prefix_len]
将从结果中删除这些手机。 / LI>
答案 1 :(得分:1)
df1 = pd.DataFrame({'name':["Louisa","Jack","Matt","Jenny"],'phone':[443442,442334,442212,453224],'duration':[1,6,2,1]})
df2 = pd.DataFrame({'prefix':[443,446,442,476],'charge':[0.8,0.8,0.6,0.8],'rate':[0.3,0.4,0.1,0.3]})
df3=pd.concat((df1,df2),axis=1)
df4=pd.DataFrame({"phone_pref":df3["phone"].astype(str).str[:3]})
df4=df4["phone_pref"].drop_duplicates()
df3["bill"]=None
for j in range(len(df4)):
for i in range(len(df3["prefix"])):
if df3.loc[i,"prefix"]==int(df4.iloc[j]):
df3.loc[i,"bill"]=df3.loc[i,"duration"]*df3.loc[i,"charge"]+df3.loc[i,"rate"]
print(df3)
duration name phone charge prefix rate bill
0 1 Louisa 443442 0.8 443 0.3 1.1
1 6 Jack 442334 0.8 446 0.4 None
2 2 Matt 442212 0.6 442 0.1 1.3
3 1 Jenny 453224 0.8 476 0.3 None
账单栏中的无值是因为在您的例子中没有电话号码有前缀446或476,因此它们不在df4 ... 该法案也是用问题中给出的你的公式计算的