从包含CSV文件的文件夹中过滤Power Query中的多个条件的方法

时间:2019-11-17 22:17:22

标签: excel vba performance powerbi powerquery

我需要纠正或建议查询的帮助,我正在从CSV格式的文件夹中获取数据。 请提前警告:我不知道如何在短时间内写这篇文章。

首先要提供的信息是

  • 工具仅限于Power Query,Excel,VBA

  • 数据查询每月运行一次,因此加载时间较长不是一个大问题,尽管通常更短的时间更可取

  • 我选择了Power Query方法,因为必须在另一个Excel文件中使用源数据,但是要使用不同的规则集(这是我当前问题的一部分)。
  • 我的代码的基本问题是它可以运行很长时间,需要满足大量条件,并且出于其他原因/工具/文件,我必须使用类似的方法。我希望人们只需按“刷新”即可获取所需的信息。

说明:

我在一个文件夹中的CSV文件中有数据源。命名约定不存在,因为有多个人负责从系统中导出数据。因此,我在PQ中使用了文件夹选项。

数据大小当前约为400-600 MB。列的名称可能正在更改,这是M代码中要绕过的第一行。

我的主要奋斗是:

有几个条件需要实现。我不想编写多个if语句,因为代码会变得很丑陋,而且条件的数量只有十分之一,而且跨越多列。因此,我实现了一个转换表(我们称其为TT),该表具有所有可以使用过滤的列,该TT的最后一列是所有列的串联。如果在某种情况下我不在乎其中一列,则用通配符“ *”填充它。

所以TT可能看起来像这样:

| PC | CLIENT | FN  | TC | STRING      |

|----|--------|-----|----|-------------|

| 11 | *      | NEW | AC | 11*NEWAC    |

| 47 | 000001 | NEW | *  | 47000001NEW*|

等...

PC是PoC,FN是FUNCTION,TC是交易代码(在下面的代码中)。

然后在代码中,我将通配符替换为来自PQ的适当列的值,并检查TT中是否包含来自PQ中相同列的串联字符串(将最后一列制成列表)。 下面的代码可用于更简单的解决方案,但是它经过了硬编码,因为我想知道是否有可能。

数据更新后,我运行VBA宏以将数据附加到“数据库”表中(ofc检查是否存在现有值),以便可以最大程度地减少数据负载。因此,使用了第一部分代码。 基本上,我可以将代码分为三部分:

  1. 基本转换:从文件夹加载,删除非常规名称,并检查其他文件夹是否包含相同的命名文件以最小化加载。
  2. 过滤数据:包括将PQ表与TT表合并,用正确的列替换通配符,然后创建过滤字符串以检查串联的PQ表中的文本是否至少包含TT列表中的一个值。
  3. 对使用过的数据进行最终转换以获取我需要的信息(主要是关于从市场上延迟清算)

带有注释的整个M代码

let
    /*Here starts basic data transformation to limit errors in CSV files due to 
    different conventions */
    Source = Folder.Files(source),
    #"Uppercased Text1" = Table.TransformColumns(Source,{{"Name", Text.Upper, type text}}),
    #"Merged Queries2" = Table.NestedJoin(#"Uppercased Text1", {"Name"}, q_Archive, {"Name"}, "q_Archive", JoinKind.LeftAnti),
    #"Added Custom" = Table.AddColumn(#"Merged Queries2", "Data", each Csv.Document(File.Contents([Folder Path] & "\" & [Name]),[Delimiter=";", Encoding = 1252, QuoteStyle = QuoteStyle.None])),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Data"}),
    #"Added Custom1" = Table.AddColumn(#"Removed Other Columns", "Table", each Table.PromoteHeaders([Data])),
    #"Removed Other Columns1" = Table.SelectColumns(#"Added Custom1",{"Table"}),
    #"Added Custom2" = Table.AddColumn(#"Removed Other Columns1", "Upper", each Table.TransformColumnNames([Table],Text.Upper)),
    #"Removed Other Columns2" = Table.SelectColumns(#"Added Custom2",{"Upper"}),
    #"Expanded Upper" = Table.ExpandTableColumn(#"Removed Other Columns2", "Upper", {"19A AMOUNT", "19A CURRENCY CODE", "35B ISIN", "CLIENT", "EXP.SETTL.DATE", "FUNCTION", "INSTR.ID", "MESSAGE FUNCTION", "POC", "RECEPTION DATE", "SETTL.AMOUNT", "SETTL.CUR.", "TRANSACTION CODE"}, {"19A AMOUNT", "19A CURRENCY CODE", "35B ISIN", "CLIENT", "EXP.SETTL.DATE", "FUNCTION", "INSTR.ID", "MESSAGE FUNCTION", "POC", "RECEPTION DATE", "SETTL.AMOUNT", "SETTL.CUR.", "TRANSACTION CODE"}),
    #"Renamed Columns1" = Table.RenameColumns(#"Expanded Upper",{{"SETTL.AMOUNT", "SETTL.AMOUNT2"}, {"SETTL.CUR.", "SETTL.CUR.2"}, {"19A CURRENCY CODE", "19A CURRENCY CODE2"}, {"19A AMOUNT", "19A AMOUNT2"}}),
    #"Added Custom10" = Table.AddColumn(#"Renamed Columns1", "19A AMOUNT", each if[SETTL.AMOUNT2]=null then [19A AMOUNT2] else [SETTL.AMOUNT2]),
    #"Added Custom11" = Table.AddColumn(#"Added Custom10", "19A CURRENCY CODE", each if [SETTL.CUR.2] = null then [19A CURRENCY CODE2] else [SETTL.CUR.2]),
    #"Renamed Columns" = Table.RenameColumns(#"Added Custom11",{{"FUNCTION", "FUNCTION2"}}),
    #"Added Custom8" = Table.AddColumn(#"Renamed Columns", "FUNCTION", each if[FUNCTION2]=null then [MESSAGE FUNCTION] else[FUNCTION2]),
    #"Removed Other Columns3" = Table.SelectColumns(#"Added Custom8",{"35B ISIN", "CLIENT", "EXP.SETTL.DATE", "INSTR.ID", "POC", "RECEPTION DATE", "TRANSACTION CODE", "19A AMOUNT", "19A CURRENCY CODE", "FUNCTION"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Other Columns3",{"POC", "CLIENT", "FUNCTION", "TRANSACTION CODE", "EXP.SETTL.DATE", "RECEPTION DATE", "19A AMOUNT", "19A CURRENCY CODE"}),
    #"Replaced Value" = Table.ReplaceValue(#"Reordered Columns","""","",Replacer.ReplaceText,{"POC", "CLIENT", "INSTR.ID", "35B ISIN"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","=","",Replacer.ReplaceText,{"POC", "CLIENT", "INSTR.ID", "35B ISIN"}),
    #"Uppercased Text" = Table.TransformColumns(#"Replaced Value1",{{"POC", Text.Upper, type text}, {"CLIENT", Text.Upper, type text}, {"FUNCTION", Text.Upper, type text}, {"TRANSACTION CODE", Text.Upper, type text}}),
    #"Filtered Rows" = Table.SelectRows(#"Uppercased Text", each ([FUNCTION] = "NEWM")),
    #"Merged Queries" = Table.NestedJoin(#"Filtered Rows", {"POC"}, tbl_setup_pocList, {"PocList"}, "tbl_setup_pocList", JoinKind.Inner),
        #"Removed Columns" = Table.RemoveColumns(#"Merged Queries",{"tbl_setup_pocList"}),


/* Here ends the data transformation part
   and the part for list transformations start*/
        #"Added condition" = Table.AddColumn(#"Removed Columns","COND", each (
            ((Table.FromRecords({
                [PC = List.ReplaceValue(Table.Column(tbl_filtering_string, "POC"),"*",[POC], Replacer.ReplaceText),
                CL = List.ReplaceValue(Table.Column(tbl_filtering_string, "CLIENT"),"*",[CLIENT], Replacer.ReplaceText),
                FN = List.ReplaceValue(Table.Column(tbl_filtering_string, "FUNCTION"),"*",[FUNCTION], Replacer.ReplaceText),
                TC = List.ReplaceValue(Table.Column(tbl_filtering_string, "TRANSACTION CODE"),"*",[TRANSACTION CODE], Replacer.ReplaceText)]}
            ))))),
        #"Expanded COND" = Table.ExpandTableColumn(#"Added condition", "COND", {"PC", "CL", "FN", "TC"}, {"PC", "CL", "FN", "TC"}),
        #"Added Custom3" = Table.AddColumn(#"Expanded COND", "Test",  each (List.Combine(
            {
                {_[PC]},{_[CL]},{_[FN]},{_[TC]}
            }
        ))),
        #"Expanded Test" = Table.AddColumn(#"Added Custom3", "Test2", each (Table.FromColumns(_[Test],null))),
        #"Removed Columns2" = Table.RemoveColumns(#"Expanded Test",{"PC", "CL", "FN", "TC", "Test"}),
        #"Added Custom4" = Table.AddColumn(#"Removed Columns2", "String", each Table.ToList([Test2],Combiner.CombineTextByDelimiter(""))),
        #"Removed Columns3" = Table.RemoveColumns(#"Added Custom4",{"Test2"}),
        #"Added Custom6" = Table.AddColumn(#"Removed Columns3", "CONTAIN_STR", each [POC]&[CLIENT]&[FUNCTION]&[TRANSACTION CODE]),
        #"Added Custom5" = Table.AddColumn(#"Added Custom6", "Cond", each List.Contains(_[String],[CONTAIN_STR])),
        #"Filtered Rows1" = Table.SelectRows(#"Added Custom5", each ([Cond] = false)),

        /*Here the code for filtering ends and final transformations occur */

        #"Removed Columns4" = Table.RemoveColumns(#"Filtered Rows1",{"String", "CONTAIN_STR", "Cond"}),
        #"Merged Queries1" = Table.NestedJoin(#"Removed Columns4", {"POC"}, tbl_setup_exotics, {"Exotic_PoC"}, "tbl_setup_exotics", JoinKind.LeftOuter),
        #"Expanded tbl_setup_exotics" = Table.ExpandTableColumn(#"Merged Queries1", "tbl_setup_exotics", {"Exotic_PoC"}, {"Exotic_PoC"}),
        #"Replaced Value2" = Table.ReplaceValue(#"Expanded tbl_setup_exotics",null, "Non Exotic",Replacer.ReplaceValue,{"Exotic_PoC"}),
        #"Removed Errors" = Table.RemoveRowsWithErrors(#"Replaced Value2", {"EXP.SETTL.DATE", "RECEPTION DATE"}),
        #"Changed Type" = Table.TransformColumnTypes(#"Removed Errors",{{"EXP.SETTL.DATE", type date}, {"RECEPTION DATE", type date}}),
        #"Added Custom7" = Table.AddColumn(#"Changed Type", "RD", each (if [Exotic_PoC] <> "Non Exotic" then Date.AddDays([RECEPTION DATE],1)else [RECEPTION DATE])),
        #"Filtered Rows2" = Table.AddColumn(#"Added Custom7", "LB" , each if [RD]>=[EXP.SETTL.DATE] then "Late" else "Not"),
        #"Added Custom9" = Table.AddColumn(#"Filtered Rows2", "DAYS_LATE", each [RD]-[EXP.SETTL.DATE]),
        #"Inserted Year" = Table.AddColumn(#"Added Custom9", "Year", each Date.Year([EXP.SETTL.DATE]), Int64.Type),
        #"Inserted Month" = Table.AddColumn(#"Inserted Year", "Month", each Date.Month([EXP.SETTL.DATE]), Int64.Type),
        #"Changed Type1" = Table.TransformColumnTypes(#"Inserted Month",{{"19A AMOUNT", type number}}),
        #"Grouped Rows" = Table.Group(#"Changed Type1", {"Year", "Month", "POC", "19A CURRENCY CODE", "DAYS_LATE", "LB"}, {{"Count", each Table.RowCount(_), type number}, {"Countervalue", each List.Sum([19A AMOUNT]), type text}, {"ISIN", each Text.Combine([35B ISIN],";"), type text}, {"INSTR.ID", each Text.Combine([INSTR.ID], ";"), type text}}),
        #"Merged Queries3" = Table.NestedJoin(#"Grouped Rows", {"Year", "Month", "19A CURRENCY CODE"}, q_Xrates, {"Year", "Month", "Currency"}, "q_Xrates", JoinKind.LeftOuter),
        #"Expanded q_Xrates" = Table.ExpandTableColumn(#"Merged Queries3", "q_Xrates", {"Rate"}, {"Rate"}),
        #"Replaced Value3" = Table.ReplaceValue(#"Expanded q_Xrates",null,1,Replacer.ReplaceValue,{"Rate"}),
        #"Added Col" = Table.AddColumn(#"Replaced Value3", "CV", each [Countervalue]/[Rate]),
        #"Remove Countervalue" = Table.RemoveColumns(#"Added Col", {"Countervalue"})
    in
        #"Remove Countervalue"

问题

  1. 我知道这种方法听起来过于复杂,但可以使其正常工作(不幸的是,刷新需要很长时间)。但是真的好吗?考虑到开始提到的有限的工具使用方法,还有其他选择吗?
  2. 如何使这段代码更好?我相信它可能会部分重现功能,但是由于我是PQ的初学者,所以我无法想象如何。
  3. 对于相同的源数据,如何使用相同的方法却具有更大的复杂性?您可以理解,随着更多列添加到过滤字符串中。
  4. 您还有其他建议吗?

结束评论

  • 我现在非常绝望,有时我的书面文字可能令人困惑。
  • 提供任何一种Visio图表来以更图形化的方式显示我的逻辑(我对此比较熟悉)以及关系概述,我没有任何问题。
  • 我也没有提供匿名数据的问题(因为它可能是部分机密的)。如果您需要该服务,请参阅首选服务。
  • 如果我被推向正确的方向,我不介意编写代码。对于该问题,#1是优先事项。因此,从根本上讲,这种好方法是否可以针对另一个相同但更复杂的目的轻松调整?

非常感谢您的光临。

*/ MK */

1 个答案:

答案 0 :(得分:0)

如果要执行此操作,我将编写一个将过滤条件表编译为函数的函数,然后将其与Table.SelectRows一起应用。

// Compile the condition table into a function that can be applied in row filtering.
filterCondition = compileFilterConditionTable(tbl_filtering_string),

#"Filtered Rows" = Table.SelectRows(#"Table after Preceding Steps", filterCondition)

这看起来更容易追踪步骤吗?

下面是将条件表编译成逻辑函数的函数示例代码。我不确定这对您的情况是否正确,因为我不完全了解该要求。

compileFilterConditionTable =

    let compileFilterConditionTable = (filterConditionTable as table) as function =>
            let recordConditions = List.Transform(
                    Table.ToRecords(filterConditionTable),
                    compileFilterConditionRecord)
            in applyCombine(recordConditions, List.AnyTrue),

        compileFilterConditionRecord = (cond as record) as function =>
            let fieldNameValues = List.Transform(
                    Record.FieldNames(cond),
                    each [Name = _, Value = Record.Field(cond, Name)]
                ),
                fieldConditions = List.Transform(fieldNameValues, compileFieldCondition)
            in applyCombine(fieldConditions, List.AllTrue),

        compileFieldCondition = (fieldNameValue as record) as function =>
            let name = fieldNameValue[Name],
                value = fieldNameValue[Value]
            in
                if value = "*" then (record as record) as logical => true
                else (record as record) as logical => Record.Field(record, name) = value,

        applyCombine = (functions as list, combiner as function) as function =>
            (value) => combiner(List.Transform(functions, (f) => f(value)))

    in compileFilterConditionTable

无论如何,M是一种功能编程语言,因此以功能方式思考和编码它会有所帮助。将整个逻辑分为几个小部分,这样每个小部分都将很容易理解。将您的代码编写为可重用的小函数,然后将它们组合以构建整体。