我有2个输入表。输入表1是源数据,输入表2是条件表。
+--------------------------+----------+ +--------------------------+-------+
| TABLE 1 (Source data) | | TABLE 2 (Criterias) |
+-------------------------------------+ +----------------------------------+
+-------------------------------------+ +----------------------------------+
| DESCRIPTION | VALUE | | PREFIX | CODE |
+-------------------------------------+ +----------------------------------+
| ID | 0 | | 7235 | ABX1 |
| NAME | JFMSC | | 3553 | POWQ |
| TYPE | UHELQ | | 7459 | UWEER |
| DFRUL | F4 | | 10012 | ABX1 |
| ADDR | 10012002 | | 430 | ABX1 |
| RRUL | P1 | +--------------------------+-------+
| ADDR | 723 |
| RRUL | P1 |
| ID | 2 |
| NAME | PLLSJS |
| TYPE | UHELQ |
| DFRUL | P3 |
| ID | 4 |
| NAME | AAAARR |
| TYPE | UHELQ |
| DFRUL | T7 |
| ADDR | 35531156 |
| RRUL | P1 |
| ADDR | 72358 |
| RRUL | P1 |
| ADDR | 86401 |
| RRUL | K9 |
| ID | 0 |
| NAME | PPROOA |
| TYPE | RRHN |
| DFRUL | P1 |
| ADDR | 43001 |
| RRUL | T8 |
| ADDR | 7459001 |
| RRUL | D4 |
| ADDR | 430457 |
| RRUL | W2 |
| ADDR | 745913 |
| RRUL | P1 |
| ADDR | 74598001 |
| RRUL | Y5 |
+--------------------------+----------+
我的目标是获得一个如下所示的输出表(将是表#4), 根据“表2”的标准,显示与字段“ ADDR”的每个数量相比最相似的代码。 如果每个ID都有重复的CODE,我只想显示一个(唯一代码列表)。
我在SampleV1.xlsx附件中提供的示例文件中对此进行了详细说明。
我想对基于输入表1和2的数据进行转换以获得这样的输出表(附件中的所需输出表#2):
+----+--------+-------+-------+-------+------+
| ID | NAME | TYPE | DFRUL | CODE | RRUL |
+----+--------+-------+-------+-------+------+
| 0 | JFMSC | UHELQ | P1 | ABX1 | P1 |
| 2 | PLLSJS | UHELQ | P3 | | |
| 4 | AAAARR | UHELQ | T7 | POWQ | P1 |
| | | | | ABX1 | P1 |
| | | | | 86401 | K9 |
| 0 | PPROOA | RRHN | P1 | ABX1 | P1 |
| | | | | UWEER | P1 |
+----+--------+-------+-------+-------+------+
我希望有人可以帮助我。提前致谢。
答案 0 :(得分:1)
下面是UPDATED解决方案。
总的来说,我编译了解决方案是为了尽可能减少数据问题。
对数据的唯一限制是:
字段集必须具有ID字段,该ID字段必须是集合的第一个字段。
所有RRUL和ADDR必须成对出现,
一个ID内的RRUL / ADDR对重复是可接受的还是不存在的。
我还以某种方式编译了解决方案,以正确找到ADDR和PREFIX的所有可能变体中的最接近值。顺便说一句-在大样本中没有涉及一种情况-当PREFIX比ADDR短但不等于ADDR时。如果出现这种情况,我的解决方案可以正确处理它们,但在这种情况下需要一些性能开销。
let
Source = #"Source data",
#"Added Index1" = Table.AddIndexColumn(Source, "Index", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index1", "Main Key", each if [DESCRIPTION] = "ID" then [Index] else null, type number),
#"Added Custom10" = Table.AddColumn(#"Added Custom", "Last notADDR", each
if [DESCRIPTION] <> "ADDR" and [DESCRIPTION] <> "RRUL" then [Index] else null),
#"Filled Down" = Table.FillDown(#"Added Custom10",{"Main Key", "Last notADDR"}),
#"Added Custom2" = Table.AddColumn(#"Filled Down", "Key", each [Main Key] + (
if [DESCRIPTION] = "RRUL" then [Index] - [Last notADDR] - 2
else if [DESCRIPTION] = "ADDR" then [Index] - [Last notADDR] - 1 else 0)),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"Index", "Main Key", "Last notADDR"}),
#"Pivoted Column1" = Table.Pivot(#"Removed Columns",
List.Distinct(#"Removed Columns"[DESCRIPTION]), "DESCRIPTION", "VALUE"),
#"Added Custom3" = Table.AddColumn(#"Pivoted Column1", "CODE", each if [ADDR] = null then null else let t = Table.AddIndexColumn(Table.SelectRows(Criterias, (x)=>
let s=List.Sort({x[PREFIX], [ADDR]}, each Text.Length(_)) in Text.StartsWith(s{1}, s{0})), "Index")
in if Table.RowCount(t) > 0 then Table.First(Table.Sort(t, (y)=> Number.BitwiseShiftLeft(Number.Abs(Text.Length([ADDR]) - Text.Length(y[PREFIX])), 16) + y[Index]))[CODE]
else "Not Found"),
#"Removed Columns1" = Table.RemoveColumns(#"Added Custom3",{"Key", "ADDR"}),
#"Filled Down1" = Table.FillDown(#"Removed Columns1",{"ID", "NAME", "TYPE", "DFRUL"})
in
#"Filled Down1"