从xml数据的文本文件中提取字段

时间:2018-09-21 21:20:49

标签: xml sas extract

我有一个读入SAS的文本文件,并清理到每一行都包含以下内容的位置:

<xsd:element name="ReportingUnit" type="reportingunit:ReportingUnit_def" minOccurs="1" maxOccurs="1"/>

我需要提取name的值和type的值。 因此,在这种情况下,我需要获取 ReportingUnit ReportingUnit_def

任何帮助将不胜感激。 谢谢

2 个答案:

答案 0 :(得分:3)

xsdxml。不幸的是,不是格式化为xmlv2引擎会支持的xml。如果您声明的xsd是干净的,则使用input指针控件平移@'character-string'将提取您想要的数据。

示例代码

filename myxsd temp;

* example xsd from https://docs.microsoft.com/en-us/visualstudio/xml-tools/sample-xsd-file-simple-schema?view=vs-2015;
data _null_;
  file myxsd;
  input;
  put _infile_;
datalines;
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"   
           xmlns:tns="http://tempuri.org/PurchaseOrderSchema.xsd"   
           targetNamespace="http://tempuri.org/PurchaseOrderSchema.xsd"   
           elementFormDefault="qualified">  
 <xsd:element name="PurchaseOrder" type="tns:PurchaseOrderType"/>  
 <xsd:complexType name="PurchaseOrderType">  
  <xsd:sequence>  
   <xsd:element name="ShipTo" type="tns:USAddress" maxOccurs="2"/>  
   <xsd:element name="BillTo" type="tns:USAddress"/>  
  </xsd:sequence>  
  <xsd:attribute name="OrderDate" type="xsd:date"/>  
 </xsd:complexType>  

 <xsd:complexType name="USAddress">  
  <xsd:sequence>  
   <xsd:element name="name"   type="xsd:string"/>  
   <xsd:element name="street" type="xsd:string"/>  
   <xsd:element name="city"   type="xsd:string"/>  
   <xsd:element name="state"  type="xsd:string"/>  
   <xsd:element name="zip"    type="xsd:integer"/>  
  </xsd:sequence>  
  <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>  
 </xsd:complexType>  
</xsd:schema>  
run;

libname myxsd xmlv2;

proc copy in=myxsd out=work;
run;

data weak_parse;
  infile myxsd dsd dlm=" />" missover;
  length name type $100;
  input @"name=" name @"type=" type;
run;

当proc复制尝试通过libname读取xsd时,将发生日志错误。但是输入语句运行得很好

536  libname myxsd xmlv2;
NOTE: Libref MYXSD was successfully assigned as follows:
      Engine:        XMLV2
      Physical Name: C:\Users\Richard\AppData\Local\Temp\SAS Temporary
      Files\_TD2764_HELIUM_\#LN00053
537
538  proc copy in=myxsd out=work;
539  run;

ERROR: XML data is not in a format supported natively by the XML libname engine. Files of this
       type may require an XMLMap to be input properly.
NOTE: Statements not processed because of errors noted above.
NOTE: PROCEDURE COPY used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds

NOTE: The SAS System stopped processing this step because of errors.
540


541  data weak_parse;
542    infile myxsd dsd dlm=" />" missover;
543    length name type $100;
544    input @"name=" name @"type=" type;
545  run;

NOTE: The infile MYXSD is:
      Filename=C:\Users\Richard\AppData\Local\Temp\SAS Temporary Files\_TD2764_HELIUM_\#LN00053,
      RECFM=V,LRECL=32767,File Size (bytes)=1968,
      Last Modified=21Sep2018:22:51:56,
      Create Time=21Sep2018:22:51:56

NOTE: 24 records were read from the infile MYXSD.
      The minimum record length was 80.
      The maximum record length was 80.
NOTE: The data set WORK.WEAK_PARSE has 24 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

读入的数据将是

The SAS System

Obs    name                 type

  1
  2
  3
  4
  5    PurchaseOrder        tns:PurchaseOrderType
  6    PurchaseOrderType
  7
  8    ShipTo               tns:USAddress
  9    BillTo               tns:USAddress
 10
 11    OrderDate            xsd:date
 12
 13
 14    USAddress
 15
 16    name                 xsd:string
 17    street               xsd:string
 18    city                 xsd:string
 19    state                xsd:string
 20    zip                  xsd:integer
 21
 22    country              xsd:NMTOKEN
 23
 24

答案 1 :(得分:0)

遇到了同样的问题,请试试这个(它对我有用):

data want;
infile "M:\some\path\ihave\favorites.xml";
length line $100;
input;
line = _infile_;
run;