是否存在将数据从一个单元格转换为表格的公式?

时间:2019-09-02 18:21:13

标签: excel csv excel-formula

我正在使用Amazon Mechanical Turk转录收据数据。亚马逊返回的CSV乍一看是不可读的。 CSV文件的网址:https://drive.google.com/file/d/1QR4cgdVrkYwRni3YM5Dc_umIKFGiX_0k/view?usp=sharing

但是,当您将其导入excel时,将分隔符设置为逗号至少是可读的。这是excel文件的URL(请下载并用excel打开,这样会更好):https://drive.google.com/file/d/1Noj4UUMd-p1iYKIWDgKURQUzCdhu5Ck1/view?usp=sharing

但是,然后Excel将转录器的所有答案放在一个称为“ Answer.taskAnswers”的单元格中。

所需结果:像这样的表中的Transciber值(请检查此URL:https://i.ibb.co/vjf0t0c/Prefered-formatting-of-cell-Answer-task-Answers-2.png

可能的解决方案1:一种格式化CSV文件的方式,使其看起来像“期望结果”中的表格。

可能的解决方案2: 一个公式,它生成“ Answer.taskAnswers”的另一个表(可能在另一张纸上),该表看起来像是“期望结果”中的表。

有人知道这个解决方法吗?

1 个答案:

答案 0 :(得分:1)

编辑: M代码已更改,以允许csv JSON字符串中的列(产品)数量不同

从输出的外观来看,我猜您是使用Power Query(又名Get & Transform)来输入数据的。

在这种情况下,您可以编辑查询以获取所需的输出。 (否则,整个过程都可以使用它。)

您要从中解析输出的列为JSON格式,并且PQ具有内置的解析器。

我使用了您提供的原始CSV文件。

我们删除不相关的列和空白行,解析JSON字符串,然后重新排列数据。

除自定义列公式外的所有步骤都可以在GUI中完成。

自定义列公式从相关列中的JSON字符串中提取元素:=Json.Document([Answer.taskAnswers])

您只需将M代码粘贴到PQ的“高级编辑器”中,然后检查GUI中的步骤以查看发生了什么。
您还必须编辑Source行,以反映实际获取源数据的位置(可以是URL而不是文件)

M代码

let
    Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\Stackoverflow data for question about cell formating (1).csv"),[Delimiter=",", Columns=31, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Removed Other Columns" = Table.SelectColumns(#"Promoted Headers",{"Answer.taskAnswers"}),
    #"Removed Blank Rows" = Table.SelectRows(#"Removed Other Columns", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
    #"Added Custom" = Table.AddColumn(#"Removed Blank Rows", "strJSON", each Json.Document([Answer.taskAnswers])),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Answer.taskAnswers"}),
    #"Expanded strJSON" = Table.ExpandListColumn(#"Removed Columns", "strJSON"),
    #"Expanded strJSON1" = Table.ExpandRecordColumn(#"Expanded strJSON", "strJSON", List.Union(List.Transform(#"Expanded strJSON"[strJSON], each Record.FieldNames(_)))),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Expanded strJSON1", {"purchaseTime", "purchaseDate", "storeName"}, "Attribute", "Value"),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Unpivoted Other Columns", "Attribute", Splitter.SplitTextByEachDelimiter({"-"}, QuoteStyle.Csv, true), {"Attribute.1", "Attribute.2"}),
    #"Sorted Rows" = Table.Sort(#"Split Column by Delimiter",{{"Attribute.2", Order.Ascending}}),
    #"Pivoted Column" = Table.Pivot(#"Sorted Rows", List.Distinct(#"Sorted Rows"[Attribute.1]), "Attribute.1", "Value"),
    #"Removed Columns1" = Table.RemoveColumns(#"Pivoted Column",{"Attribute.2"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns1",{"storeName", "purchaseDate", "purchaseTime", "product", "price", "weight", "quantity"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Reordered Columns",{{"purchaseDate", type date}, {"purchaseTime", type time}, {"price", Currency.Type}, {"quantity", Int64.Type}})
in
    #"Changed Type"

原始GUI生成的M代码具有此行,该行专门命名JSON列。它不能适应产品数量的变化。

#"Expanded strJSON1" = Table.ExpandRecordColumn(#"Expanded strJSON", "strJSON", {"price-1", "price-2", "price-3", "price-4", "price-5", "product-1", "product-2", "product-3", "product-4", "product-5", "purchaseDate", "purchaseTime", "quantity-1", "quantity-2", "quantity-3", "quantity-4", "quantity-5", "storeName", "weight-1", "weight-5", "weight-3"}, {"price-1", "price-2", "price-3", "price-4", "price-5", "product-1", "product-2", "product-3", "product-4", "product-5", "purchaseDate", "purchaseTime", "quantity-1", "quantity-2", "quantity-3", "quantity-4", "quantity-5", "storeName", "weight-1", "weight-5", "weight-3"}),

所以我在上面的M代码中修改了该行,以解决该问题。

输出

enter image description here

GUI步骤

enter image description here