我有输入xml
<data xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" generationTimestamp="2015-08-07T15:04:01.550+02:00" schemaVersion="1.7" xsi:noNamespaceSchemaLocation="http://schemas.unfccc.int/inventoryreporting/simple1_7.xsd">
<party name="AUS"/>
<submission uid="F928563A-471D-40FB-B1E0-022401746319" version="3" name="AUS_2015_3_Inventory"/>
<variables>
<variable name="[Enteric Fermentation][Other Sheep.Sheep][Emissions][CH4][kt][no source][no method][no target][no option][no type]" uid="39F4C4B0-ADA2-44D0-BBC0-B99A2B917FAA" userCreated="true" type="NUMBER">
<years>
<year name="2011" uid="6CE3F5C5-D464-48A5-9F2F-CDE450108F5F">
<record>
<value>492.74734020836445</value>
<comments/>
</record>
</year>
<year name="2010" uid="F18F7BF5-8AF7-47C1-A19F-584E84D2A7A4">
<record>
<value>469.78235318968376</value>
<comments/>
</record>
</year>
<year name="1994" uid="943A31F2-CDDD-49E3-99BF-C9CA082EB057">
<record>
<value>920.00059365049015</value>
<comments/>
</record>
</year>
</years>
</variable>
</variables>
</data>
我希望最终输出为(提交名称,变量uid,年份名称,年份值)
(AUS_2015_3_Inventory,39F4C4B0-ADA2-44D0-BBC0-B99A2B917FAA,2011,492.74734020836445)
(AUS_2015_3_Inventory,39F4C4B0-ADA2-44D0-BBC0-B99A2B917FAA,2010,469.78235318968376)
(AUS_2015_3_Inventory,39F4C4B0-ADA2-44D0-BBC0-B99A2B917FAA,1994,920.00059365049015)
我试过这个猪代码,但它无法正常工作
-- register piggybank jar
register piggybank.jar;
-- load xml
xmldata = load 'uidXML1.xml' using org.apache.pig.piggybank.storage.XMLLoader('data') as (xmldata_content:chararray);
-- fetch submission name, variable uid and all year tags
common_data = foreach xmldata generate FLATTEN(REGEX_EXTRACT_ALL(xmldata_content, '[\\s*\\S*]*<submission[\\s*\\S*]*name="(.*?)"[\\s*\\S*]*/>[\\s*\\S*]*<variable[\\s*\\S*]*uid="(.*?)"[\\s*\\S*]*>\\s*<years>(.*?)</years>[\\s*\\S*]*')) as (sub_name:chararray,var_uid:chararray,years_data:chararray);
-- split data on the basis of year
years_split_up = foreach common_data generate sub_name, var_uid, FLATTEN(STRSPLIT(years_data,'</year>\\s*',0)) as (year_wise_xml:chararray);
-- fetch submission name, variable uid, yare name and year value
parsed_data = foreach years_split_up generate sub_name, var_uid, FLATTEN(REGEX_EXTRACT_ALL(year_wise_xml,'\\s*<year[\\s*\\S*]*name="(.*?)"[\\s*\\S*]*>[\\s*\\S*]*<value>(.*?)</value>[\\s*\\S*]*')) as (year_name:chararray, year_value:chararray);
以上猪代码的输出是
(AUS_2015_3_Inventory,39F4C4B0-ADA2-44D0-BBC0-B99A2B917FAA,2011,492.74734020836445)
我只获得第一年的标签,而不是获得其他两个标签。不知道我做错了什么。
我不想使用 STRSPITTOBAG 功能,因为我使用Pig 0.14中引入的Pig 0.12和STRSPLITTOBAG。
请帮帮我。
感谢。
dump common_data的OUTPUT;
(AUS_2015_3_Inventory, 39F4C4B0-ADA2-44D0-BBC0-B99A2B917FAA, <year name="2011" uid="6CE3F5C5-D464-48A5-9F2F-CDE450108F5F"> <record> <value>492.74734020836445</value> <comments/> </record> <year name="2010" uid="F18F7BF5-8AF7-47C1-A19F-584E84D2A7A4"> <record> <value>469.78235318968376</value> <comments/> </record> <year name="1994" uid="943A31F2-CDDD-49E3-99BF-C9CA082EB057"> <record> <value>920.00059365049015</value> <comments/> </record>)
转储输出年份_split_up
(AUS_2015_3_Inventory,39F4C4B0-ADA2-44D0-BBC0-B99A2B917FAA, <year name="2011" uid="6CE3F5C5-D464-48A5-9F2F-CDE450108F5F"> <record> <value>492.74734020836445</value> <comments/> </record> </year>, <year name="2010" uid="F18F7BF5-8AF7-47C1-A19F-584E84D2A7A4"> <record> <value>469.78235318968376</value> <comments/> </record> </year>, <year name="1994" uid="943A31F2-CDDD-49E3-99BF-C9CA082EB057"> <record> <value>920.00059365049015</value> <comments/> </record> </year>)