使用Perl解析SGML到XML文件的正确语法?

时间:2012-02-27 22:11:29

标签: xml perl parsing sgml

我是一个试图读取SGML文件的Perl新手,解析它然后将其转换为XML,这样我就可以得到所有元素的键/值对。我找到了SGML::DTDParseXML::Simple模块,因为我认为这是我想要的任务。我的问题是我找不到任何关于DTDParse或任何代码示例的文档。

我的代码如下:

# use modules
use SGML::DTDParse;
use XML::Simple;
use Data::Dumper;

use warnings;
use strict;

my $xml;
my $data;
my $convert;

$/ = undef;
open FILE, "C:/..." or die $!;
my $file = <FILE>;

# Convert the DTD file to XML
dtdParse $file;

# Create the XML object
$xml = new XML::Simple;

# Read the XML file
$data = $xml->XMLin($file);

# print the output
print Dumper($data);

我在dtdParse $文件行中收到错误,如下所示:如果没有“my script name”中的包或对象引用,则无法调用方法“dtdParse”

这里有关于正确语法的任何想法,这是一个有效的方法吗?

我再次将代码重新编写代码并且能够使用以下代码进行dtd解析:

$dtd = SGML::DTDParse::DTD->new();
$dtd->parse($file);
print $dtd;

我不相信解析后的文件可以被认为是xml,所以从解析文件中获取所有元素的正确方法可能是for循环。

3 个答案:

答案 0 :(得分:2)

没有dtdParse功能。

dtdparse是一个附带SGML :: DTDParse模块的程序。

您可以使用它从dtd文件转储xml。 如何使用dtdparse的简单示例:

use strict;
use warnings;

use SGML::DTDParse;
use XML::Simple;
use Data::Dumper;

# Convert the DTD file to XML
my $result = qx{dtdparse test.dtd};

# Create the XML object
my $xml = new XML::Simple;

# Read the XML file
$result = $xml->XMLin($result);

# print the output
$Data::Dumper::Indent = 1;
print Dumper($result);

test.dtd看起来像这样:

<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT DatabaseInventory (DatabaseName+)>
<!ELEMENT DatabaseName (   GlobalDatabaseName
                         , OracleSID
                         , DatabaseDomain
                         , Administrator+
                         , DatabaseAttributes
                         , Comments)
>
<!ELEMENT GlobalDatabaseName (#PCDATA)>
<!ELEMENT OracleSID          (#PCDATA)>
<!ELEMENT DatabaseDomain     (#PCDATA)>
<!ELEMENT Administrator      (#PCDATA)>
<!ELEMENT DatabaseAttributes EMPTY>
<!ELEMENT Comments           (#PCDATA)>

<!ATTLIST Administrator       EmailAlias CDATA #REQUIRED>
<!ATTLIST Administrator       Extension  CDATA #IMPLIED>
<!ATTLIST DatabaseAttributes  Type       (Production|Development|Testing) #REQUIRED>
<!ATTLIST DatabaseAttributes  Version    (7|8|8i|9i) "9i">

<!ENTITY AUTHOR "Jeffrey Hunter">
<!ENTITY WEB    "www.iDevelopment.info">
<!ENTITY EMAIL  "jhunter@iDevelopment.info">

将输出如下内容:

$VAR1 = {
  'namecase-entity' => '0',
  'created-by' => 'DTDParse V2.00',
  'public-id' => '',
  'version' => '1.0',
  'attlist' => {
    'DatabaseAttributes' => {
      'attribute' => {
        'Type' => {
          'value' => 'Production Development Testing',
          'type' => '#REQUIRED',
          'default' => '',
          'enumeration' => 'yes'
        },
        'Version' => {
          'value' => '7 8 8i 9i',
          'type' => '',
          'default' => '9i',
          'enumeration' => 'yes'
        }
      },
      'attdecl' => '  Type       (Production|Development|Testing) #REQUIRED'
    },
    'Administrator' => {
      'attribute' => {
        'EmailAlias' => {
          'value' => 'CDATA',
          'type' => '#REQUIRED',
          'default' => ''
        },
        'Extension' => {
          'value' => 'CDATA',
          'type' => '#IMPLIED',
          'default' => ''
        }
      },
      'attdecl' => '       EmailAlias CDATA #REQUIRED'
    }
  },
  'element' => {
    'OracleSID' => {
      'content-type' => 'mixed',
      'content-model-expanded' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      },
      'content-model' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      }
    },
    'Comments' => {
      'content-type' => 'mixed',
      'content-model-expanded' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      },
      'content-model' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      }
    },
    'DatabaseAttributes' => {
      'content-type' => 'element',
      'content-model-expanded' => {
        'empty' => {}
      },
      'content-model' => {
        'empty' => {}
      }
    },
    'GlobalDatabaseName' => {
      'content-type' => 'mixed',
      'content-model-expanded' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      },
      'content-model' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      }
    },
    'Administrator' => {
      'content-type' => 'mixed',
      'content-model-expanded' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      },
      'content-model' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      }
    },
    'DatabaseInventory' => {
      'content-type' => 'element',
      'content-model-expanded' => {
        'sequence-group' => {
          'element-name' => {
            'occurrence' => '+',
            'name' => 'DatabaseName'
          }
        }
      },
      'content-model' => {
        'sequence-group' => {
          'element-name' => {
            'occurrence' => '+',
            'name' => 'DatabaseName'
          }
        }
      }
    },
    'DatabaseDomain' => {
      'content-type' => 'mixed',
      'content-model-expanded' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      },
      'content-model' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      }
    },
    'DatabaseName' => {
      'content-type' => 'element',
      'content-model-expanded' => {
        'sequence-group' => {
          'element-name' => {
            'Comments' => {},
            'OracleSID' => {},
            'DatabaseAttributes' => {},
            'DatabaseDomain' => {},
            'GlobalDatabaseName' => {},
            'Administrator' => {
              'occurrence' => '+'
            }
          }
        }
      },
      'content-model' => {
        'sequence-group' => {
          'element-name' => {
            'Comments' => {},
            'OracleSID' => {},
            'DatabaseAttributes' => {},
            'DatabaseDomain' => {},
            'GlobalDatabaseName' => {},
            'Administrator' => {
              'occurrence' => '+'
            }
          }
        }
      }
    }
  },
  'entity' => {
    'WEB' => {
      'text-expanded' => 'www.iDevelopment.info',
      'text' => 'www.iDevelopment.info',
      'type' => 'gen'
    },
    'AUTHOR' => {
      'text-expanded' => 'Jeffrey Hunter',
      'text' => 'Jeffrey Hunter',
      'type' => 'gen'
    },
    'EMAIL' => {
      'text-expanded' => 'jhunter@iDevelopment.info',
      'text' => 'jhunter@iDevelopment.info',
      'type' => 'gen'
    }
  },
  'system-id' => 'test.dtd',
  'unexpanded' => '1',
  'created-on' => 'Tue Feb 28 00:44:52 2012',
  'declaration' => '',
  'xml' => '0',
  'title' => '?untitled?',
  'namecase-general' => '1'
};

答案 1 :(得分:2)

dtdparse不是Perl函数;它是一个从命令行处理SGML DTD的脚本。该脚本的文档是here

由于您希望在自己的Perl脚本中进行解析,因此如果您愿意,可以使用dtdparse {{1}}作为示例。

答案 2 :(得分:2)

对于SGML,请使用James Clark's SP,其中包含SGML to XML converter called SX。这是一个专业的系统,它确实有文档。如果您需要Perl,请使用systemopen将SP / SX作为外部程序调用。