有条件地删除重复项

时间:2019-12-13 05:56:07

标签: excel powerquery

我有一个列表,我想删除“终止日期(工作)”列为最新或为null的重复项。

样本数据

Id No   Name    Surname Date of Employment (Work)   Date of Termination (Work)
12405   xxxx    yyy     10/26/2018                  2/6/2019
33418   mmm     nnnn    1/1/2018                    7/30/2018
33418   mmm     nnnn    1/13/2017                   12/31/2017
33616   rrrr    sssss   7/13/2018                   11/19/2018
33616   rrrr    sssss   7/13/2018                   null
48224   ttttt   kkkk    7/15/2018                   4/14/2019

结果应该是

Id No   Name    Surname Date of Employment (Work)   Date of Termination (Work)
12405   xxxx    yyy     10/26/2018                  2/6/2019
33418   mmm     nnnn    1/1/2018                    7/30/2018
33616   rrrr    sssss   7/13/2018   
48224   ttttt   kkkk    7/15/2018                   4/14/2019

3 个答案:

答案 0 :(得分:1)

我得到了预期的输出:

Output

使用以下代码:

let
    initialTable = Table.FromColumns({
        {12405, 33418, 33418, 33616, 33616, 48224},
        {"xxxx", "mmm", "mmm", "rrrr", "rrrr", "ttttt"},
        {"yyy", "nnnn", "nnnn", "sssss", "sssss", "kkkk"},
        {#date(2018, 10, 26), #date(2018, 01, 01), #date(2017, 1, 13), #date(2018, 7, 13), #date(2018, 7, 13), #date(2018, 7, 15)},
        {#date(2019, 02, 06), #date(2018, 7, 30), #date(2017, 12, 31), #date(2018, 11, 19), null, #date(2019, 4, 14)}
    }, type table [Id No = Int64.Type, Name = text, Surname = text, #"Date of Employment (Work)" = date, #"Date of Termination (Work)" = date]),
    nullElseMaxComparer = (x as record, y as record) =>
        let
            a = Record.Field(x, "Date of Termination (Work)"),
            b = Record.Field(y, "Date of Termination (Work)"),
            comparison = if a = null then 2 else if b = null then -2 else Value.Compare(a, b)
        in comparison,
    maxOrNullPerGroup = Table.Group(initialTable, "Id No", {"toCombine", each Table.Max(_, nullElseMaxComparer)}),
    combined = Table.FromRecords(maxOrNullPerGroup[toCombine])
in
    combined
  • 如果我理解正确,对于每个ID No,您都希望最近的终止日期。
  • 在问题中显示的预期输出中,尤其是ID为33616的预期输出中,null11/19/2018更为可取。因此,出于这个问题的目的,我假设null大于任何日期。
  • nullElseMaxComparer是一个自定义比较函数,它尝试优先使用null值-可以直接传递给Table.Max

答案 1 :(得分:0)

我找到了一种解决方法(按最大分组),但是我确信有一种更简便,更快捷的方法

let
    Source = Excel.CurrentWorkbook(){[Name="Table6"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Id No", Int64.Type}, {"Name", type text}, {"Surname", type text}, {"Date of Employment (Work)", type datetime}, {"Date of Termination (Work)", type datetime}, {"Duty", type text}, {"Citizenship", type text}, {"National ID", type text}, {"Passport Serial/No", type text}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"Id No", Order.Ascending}, {"Date of Termination (Work)", Order.Descending}}),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"Id No"}, {{"all", each _, type table [Id No=number, Name=text, Surname=text, #"Date of Employment (Work)"=datetime, #"Date of Termination (Work)"=datetime, Duty=text, Citizenship=text, National ID=text, #"Passport Serial/No"=text]}, {"maxdate", each List.Max([#"Date of Termination (Work)"]), type datetime}}),
    #"Expanded all" = Table.ExpandTableColumn(#"Grouped Rows", "all", {"Name", "Surname", "Date of Employment (Work)", "Date of Termination (Work)", "Duty", "Citizenship", "National ID", "Passport Serial/No"}, {"Name", "Surname", "Date of Employment (Work)", "Date of Termination (Work)", "Duty", "Citizenship", "National ID", "Passport Serial/No"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded all", each ([#"Date of Termination (Work)"] = [maxdate])  ),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"maxdate"})
in
    #"Removed Columns"

答案 2 :(得分:0)

另一种方式:

= Table.Distinct(Table.Sort(YourTable, {"Date of Termination (Work)", 1}), {"Id No"})

enter image description here