我有一个导入为数据框的Excel文件。数据集如下所示:
rule_id reqid1 reqid2 reqid3
53139 0 0 1
51181 1 1 0
50412 0 1 1
50356 0 0 1
50239 0 1 0
50238 1 1 0
50014 1 0 1
我已经将rule_id列转换为索引。我希望结果看起来像这样:
rule_id reqid1 reqid2 reqid3 comparison1 comparison2 last_comp
53139 0 0 1 NaN NaN 100
51181 1 1 0 1.0 50.0 0
50412 0 1 1 NaN 1.0 50
50356 0 0 1 NaN NaN 100
50239 0 1 0 NaN 100.0 0
50238 1 1 0 1.0 50.0 0
50014 1 0 1 100.0 NaN 100
comparison1列是reqid1和reqid2之间的值比较,compare2是reqid2和reqid3之间的值比较,last_comp是reqid3和reqid4之间的值比较,但reqid4不可用。因此,这些值的逻辑是,如果我要比较两列,并且如果两列的值均为0,则将在新列中捕获Null值。如果第一列为1,第二列为0,则应捕获100。如果两个列都具有1,则应在compare1列中捕获1,但是如果在reqid3中该值是0,则在compare2 100/2中应捕获50。如果在reqid3中,如果值是0,则应在last_comp列中捕获0,如果值是1,则应捕获100。但是,如果reqid2和reqid3都具有1,则应捕获50。
我无法为此编写代码。任何类型的帮助将不胜感激。
答案 0 :(得分:0)
下面是一些简单的代码,可以帮助您入门:
# Compare method, gets a row containing two values as input
def compare_values(row):
a = row[0]
b = row[1]
# One of the rules
if a == 1 and b == 0:
return 100
# TODO: implement other rules
return None
# apply the `compare_values` method to all rows of ["reqid1", "reqid2"]
df["comparison1"] = df[["reqid1", "reqid2"]].apply(compare_values, axis=1)
# TODO: comparison2
我为您实现了一些内容,以获取所需的确切输出。但是使用这种结构,您应该可以遵循。
答案 1 :(得分:0)
您需要弄清楚自己的逻辑。从您写的内容来看,这可能会覆盖前两列,并使用pandas作为数据框。
import pandas as pd
# data
d = {'rule_id': [53139,51181,50412,50356,50239,50238,50014], 'reqid1':[0,1,0,0,0,1,1], 'reqid2':[0,1,1,0,1,1,0], 'reqid3':[1,0,1,1,0,0,1]}
df = pd.DataFrame(data=d)
# reorder columns
cols = df.columns.tolist()
cols = cols[-1:]+cols[:-1]
df = df[cols]
数据框:
rule_id reqid1 reqid2 reqid3
0 53139 0 0 1
1 51181 1 1 0
2 50412 0 1 1
3 50356 0 0 1
4 50239 0 1 0
5 50238 1 1 0
6 50014 1 0 1
然后添加新列的逻辑:
c1 = list(map(lambda a,b: a if a==b else 100*a, df.reqid1, df.reqid2 ))
df['comp1']=c1
c2 = list(map(lambda b,c,c1: b if b==c else (b if b < c else 100/(b+c1)), df.reqid2, df.reqid3, df.comp1 ))
df['comp2']=c2
# convert your zeros to Nans with numpy:
import numpy as np
comps = ['comp1', 'comp2']
df[comps] = df[comps].replace({0:np.nan})
输出:
rule_id reqid1 reqid2 reqid3 comp1 comp2
0 53139 0 0 1 NaN NaN
1 51181 1 1 0 1.0 50.0
2 50412 0 1 1 NaN 1.0
3 50356 0 0 1 NaN NaN
4 50239 0 1 0 NaN 100.0
5 50238 1 1 0 1.0 50.0
6 50014 1 0 1 100.0 NaN