Question

样本DF：

ID   Name     Match1    Random_Col    Match2    Price    Match3     Match4       Match5
1    Apple      Yes     Random Value   No        10      Yes        Yes          Yes
2    Apple      Yes     Random Value1  No        10      Yes        Yes          No
3    Apple      Yes     Random Value2  No        15      No         Yes          Yes
4    Orange     No      Random Value   Yes       12      Yes        Yes          No
5    Orange     No      Random Value   Yes       12      No         No           No
6    Banana     Yes     Random Value   No        15      Yes        No           No
7    Apple      Yes     Random Value   No        15      No        Yes          Yes

预期DF：

ID   Name     Match1    Random_Col    Match2  Price Match3  Match4 Match5 Final_Match
1    Apple      Yes     Random Value   No      10    Yes    Yes    Yes   Full
2    Apple      Yes     Random Value1  No      10    Yes    Yes    No  Partial
3    Apple      Yes     Random Value2  No      15    No     Yes    Yes Partial
4    Orange     No      Random Value   Yes     12    Yes    Yes    No    Full
5    Orange     No      Random Value   Yes     12    No     No     No Partial
6    Banana     Yes     Random Value   No      15    Yes    No     No   Full
7    Apple      Yes     Random Value   No      15    No     Yes    Yes Partial

问题陈述：

如果组合Name和Price是非重复的，只需将Full放在Final_Match列中（示例ID 6）
如果组合Name和Price是重复的，则在其中将Yes放在Match1到Match5列中，以较大的“是”为准，将Full放在一个，而Partial代表另一个（示例ID 1、2和4,5）
如果组合Name和Price是重复的，则在Match1至Match5列的ID计数Yes中，如果它们具有相等的“是”，则放置{{1} }（示例ID 3,7）

代码

Partial

当我不得不s = (df.replace({'Yes': 1, 'No': 0}) .iloc[:, 1:] .sum(1)) df['final_match'] = np.where(s.groupby(df[['Price','Name']]).rank(ascending=False).eq(1), 'Full ','Partial')仅用一列让我们说groupby时，上面的代码起作用了，但不适用于组合。

任何帮助！

Answer 1

使用：

node_modules

通过使用2列来匹配多列的值

1 个答案: