我正在使用Amazon Mechanical Turk转录收据数据。亚马逊返回的CSV乍一看是不可读的。 CSV文件的网址:https://drive.google.com/file/d/1QR4cgdVrkYwRni3YM5Dc_umIKFGiX_0k/view?usp=sharing
但是,当您将其导入excel时,将分隔符设置为逗号至少是可读的。这是excel文件的URL(请下载并用excel打开,这样会更好):https://drive.google.com/file/d/1Noj4UUMd-p1iYKIWDgKURQUzCdhu5Ck1/view?usp=sharing
但是,然后Excel将转录器的所有答案放在一个称为“ Answer.taskAnswers”的单元格中。
所需结果:像这样的表中的Transciber值(请检查此URL:https://i.ibb.co/vjf0t0c/Prefered-formatting-of-cell-Answer-task-Answers-2.png)
可能的解决方案1:一种格式化CSV文件的方式,使其看起来像“期望结果”中的表格。
可能的解决方案2: 一个公式,它生成“ Answer.taskAnswers”的另一个表(可能在另一张纸上),该表看起来像是“期望结果”中的表。
有人知道这个解决方法吗?
答案 0 :(得分:1)
编辑: M代码已更改,以允许csv JSON字符串中的列(产品)数量不同
从输出的外观来看,我猜您是使用Power Query
(又名Get & Transform
)来输入数据的。
在这种情况下,您可以编辑查询以获取所需的输出。 (否则,整个过程都可以使用它。)
您要从中解析输出的列为JSON格式,并且PQ具有内置的解析器。
我使用了您提供的原始CSV文件。
我们删除不相关的列和空白行,解析JSON字符串,然后重新排列数据。
除自定义列公式外的所有步骤都可以在GUI中完成。
自定义列公式从相关列中的JSON字符串中提取元素:=Json.Document([Answer.taskAnswers])
您只需将M代码粘贴到PQ的“高级编辑器”中,然后检查GUI中的步骤以查看发生了什么。
您还必须编辑Source
行,以反映实际获取源数据的位置(可以是URL而不是文件)
M代码
let
Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\Stackoverflow data for question about cell formating (1).csv"),[Delimiter=",", Columns=31, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Removed Other Columns" = Table.SelectColumns(#"Promoted Headers",{"Answer.taskAnswers"}),
#"Removed Blank Rows" = Table.SelectRows(#"Removed Other Columns", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
#"Added Custom" = Table.AddColumn(#"Removed Blank Rows", "strJSON", each Json.Document([Answer.taskAnswers])),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Answer.taskAnswers"}),
#"Expanded strJSON" = Table.ExpandListColumn(#"Removed Columns", "strJSON"),
#"Expanded strJSON1" = Table.ExpandRecordColumn(#"Expanded strJSON", "strJSON", List.Union(List.Transform(#"Expanded strJSON"[strJSON], each Record.FieldNames(_)))),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Expanded strJSON1", {"purchaseTime", "purchaseDate", "storeName"}, "Attribute", "Value"),
#"Split Column by Delimiter" = Table.SplitColumn(#"Unpivoted Other Columns", "Attribute", Splitter.SplitTextByEachDelimiter({"-"}, QuoteStyle.Csv, true), {"Attribute.1", "Attribute.2"}),
#"Sorted Rows" = Table.Sort(#"Split Column by Delimiter",{{"Attribute.2", Order.Ascending}}),
#"Pivoted Column" = Table.Pivot(#"Sorted Rows", List.Distinct(#"Sorted Rows"[Attribute.1]), "Attribute.1", "Value"),
#"Removed Columns1" = Table.RemoveColumns(#"Pivoted Column",{"Attribute.2"}),
#"Reordered Columns" = Table.ReorderColumns(#"Removed Columns1",{"storeName", "purchaseDate", "purchaseTime", "product", "price", "weight", "quantity"}),
#"Changed Type" = Table.TransformColumnTypes(#"Reordered Columns",{{"purchaseDate", type date}, {"purchaseTime", type time}, {"price", Currency.Type}, {"quantity", Int64.Type}})
in
#"Changed Type"
原始GUI生成的M代码具有此行,该行专门命名JSON列。它不能适应产品数量的变化。
#"Expanded strJSON1" = Table.ExpandRecordColumn(#"Expanded strJSON", "strJSON", {"price-1", "price-2", "price-3", "price-4", "price-5", "product-1", "product-2", "product-3", "product-4", "product-5", "purchaseDate", "purchaseTime", "quantity-1", "quantity-2", "quantity-3", "quantity-4", "quantity-5", "storeName", "weight-1", "weight-5", "weight-3"}, {"price-1", "price-2", "price-3", "price-4", "price-5", "product-1", "product-2", "product-3", "product-4", "product-5", "purchaseDate", "purchaseTime", "quantity-1", "quantity-2", "quantity-3", "quantity-4", "quantity-5", "storeName", "weight-1", "weight-5", "weight-3"}),
所以我在上面的M代码中修改了该行,以解决该问题。
输出
GUI步骤