Question

我的Json文件格式如下：

"Properties2":[{"K":"A","T":"String","V":"M "}, {"K":"B","T":"String","V":"N"}, {"K":"D","T":"String","V":"O"}]
"Properties2":[{"K":"A","T":"String","V":"W”"},{"K":"B","T":"String","V":"X"},{"K":"C","T":"String","V":"Y"},{"K":"D","T":"String","V":"Z"}]

我想使用pig：

从上面提到的json格式中提取表格式的数据

预期格式： enter image description here

注意： - 在第一个记录C列中应该为空或为空，因为在第一个记录中没有C列的值。

我尝试过使用jsonloader和eliphantbird jar但没有得到预期的输出，请告诉我任何正确的方法来获得预期的输出。

Answer 1

你可以试试这个自定义UDF吗？

示例输入1：
的 input.json

{"Properties2":[{"K":"A","T":"String","V":"M "}, {"K":"B","T":"String","V":"N"}, {"K":"D","T":"String","V":"O"}]} {"Properties2":[{"K":"A","T":"String","V":"W"},{"K":"B","T":"String","V":"X"},{"K":"C","T":"String","V":"Y"},{"K":"D","T":"String","V":"Z"}]}

<强> PigScript：

REGISTER jsonparse.jar A= LOAD 'input.json' Using JsonLoader('Properties2:{(K:chararray,T:chararray,V:chararray)}'); B= FOREACH A GENERATE FLATTEN(STRSPLIT(mypackage.JSONPARSE(BagToString(Properties2)),'_',4)); STORE B INTO 'output' USING PigStorage();

<强>输出：

M N O W X Y Z

示例输入2：

{"Properties2":[{"K":"A","T":"String","V":"W"},{"K":"B","T":"String","V":"X"},{"K":"C","T":"String","V":"Y"},{"K":"D","T":"String","V":"Z"}]} {"Properties2":[{"K":"A","T":"String","V":"M"},{"K":"B","T":"String","V":"N"},{"K":"D","T":"String","V":"O"}]} {"Properties2":[{"K":"A","T":"String","V":"J"}]} {"Properties2":[{"K":"B","T":"String","V":"X"}]} {"Properties2":[{"K":"C","T":"String","V":"Y"}]} {"Properties2":[{"K":"D","T":"String","V":"Z"}]}

<强>输出2：

W X Y Z M N O J X Y Z

UDF代码：以下java文件编译并生成为jsonparse.jar（这只是一个临时的java代码，您可以根据需要进行优化或修改）

<强> JSONPARSE.java

package mypackage; import java.io.IOException; import org.apache.pig.EvalFunc; import org.apache.pig.data.Tuple; import java.util.LinkedHashMap; import org.apache.commons.lang.StringUtils; public class JSONPARSE extends EvalFunc<String> { @Override public String exec(Tuple arg0) throws IOException { try { //Get the input String input = ((String) arg0.get(0)); //Parse the input "_" as the delimiter String[] parts = input.split("_"); //Init the hash with key as(A,B,C,D) and value as empty string LinkedHashMap<String,String> mymap= new LinkedHashMap<String,String>(); mymap.put("A", ""); mymap.put("B", ""); mymap.put("C", ""); mymap.put("D", ""); for(int i=0,j=2;i<parts.length;i=i+3,j=j+3) { //Find each key from the input and update the respective value if(mymap.containsKey(parts[i])) { mymap.put(parts[i],parts[j]); } } //Final output. String output=""; for(String key: mymap.keySet()) { //append each output "_" as delimiter output=output+(String)mymap.get(key)+"_"; } //Remove the extra delimiter "_" from the output return StringUtils.removeEnd(output,"_"); } catch(Exception e) { throw new IOException("Caught exception while processing the input row ", e); } } }

如何编译和构建jar文件：

1.Download 2 jar files from the below link(apache-commons-lang.jar,piggybank.jar) http://www.java2s.com/Code/Jar/a/Downloadapachecommonslangjar.htm http://www.java2s.com/Code/Jar/p/Downloadpiggybankjar.htm 2. Set the above 2 jar files to your class path >> export CLASSPATH=/tmp/piggybank.jar:/tmp/apache-commons-lang.jar 3. Create directory name mypackage >>mkdir mypackage 4. Compile your JSONPARSE.java file (make sure the two jars are included in the classpath otherwise compilation issue will come) >>javac JSONPARSE.java 5. Move the class file to mypackage folder >>mv JSONPARSE.class mypackage/ 6. Create jar file name jsonparse.jar >>jar -cvf jsonparse.jar mypackage/ 7. (jsonparse.jar) file will be created, include into your pig script using REGISTER command.

来自命令行的示例：

$ ls JSONPARSE.java input.json $ javac JSONPARSE.java $ mkdir mypackage $ mv JSONPARSE.class mypackage/ $ jar -cvf jsonparse.jar mypackage/ $ ls JSONPARSE.java input.json jsonparse.jar mypackage

使用Pig将Json Data转换为特定的表格式

1 个答案: