将字符串解析为另一个PHP?

时间:2015-12-03 07:41:58

标签: php regex parsing simple-html-dom

我正在使用一些具有旧IIS6的web服务,他正在撤回的只是HTML,没有JSON,XML。当我得到HTML时,我需要正确解析数据。唯一的问题是HTML非常混乱并且没有正确格式化。

以下是我使用它的服务使用GET。

http://www.zefix.ch/WebServices/Zefix/Zefix.asmx/SearchFirm?name=Dedal

它会像我这样返回HTML

    <html xmlns="http://www.w3.org/1999/xhtml" xmlns:ino="http://namespaces.softwareag.com/tamino/response2" xmlns:xql="http://metalab.unc.edu/xql/" xmlns:xq="http://namespaces.softwareag.com/tamino/XQuery/result">

<head>
    <META http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
    <title>Firmenname=dedal , suche_nach=-, Rechtsform=, Sitz=, Sitz Gemeinde=, Firmennummer=, language=1, phonetisch=no</title>
</head>

<body>
    <font face="arial" size="2">
      <b>Suche nach Firma: <i>dedal </i></b>
      <br />
      <b>(10 Suchresultate am 03.12.2015 um 08:30) [Stand: 03.12.2015 235/2015]</b>
      <br />Zentraler Firmenindex - Eidgenössisches Amt für das Handelsregister<hr /><b>DEDAL FILMS, Albrecht</b><i> in <a target="_top" href="/info/ger/VS626.htm">Lens</a></i>, Einzelunt., <a target="result" href="/WebServices/Zefix/Zefix.asmx/ShowFirm?parId=1058678&amp;parChnr=CH-626.1.014.253-3&amp;language=1">+</a>, <a target="_blank" href="http://vs.powernet.ch/webservices/inet/HRG/HRG.asmx/getHRGHTML?chnr=6261014253&amp;amt=626&amp;toBeModified=0&amp;validOnly=0&amp;lang=1&amp;sort=0">CHE-150.481.375</a><p />DEDAL TRADING SA in liquidazione<i> in <a target="_top" href="/info/ger/TI501.htm">Mendrisio</a></i>, AG, gelöscht: Publ.Dat.  29.07.2005,
         <a target="result" href="/WebServices/Zefix/Zefix.asmx/ShowFirm?parId=537570&amp;parChnr=CH-524.3.009.149-2&amp;language=1">+</a>, <a target="_blank" href="http://ti.powernet.ch/webservices/inet/HRG/HRG.asmx/getHRGHTML?chnr=5243009149&amp;amt=501&amp;toBeModified=0&amp;validOnly=0&amp;lang=1&amp;sort=0">CHE-101.054.476</a>, <a target="_blank" href="http://ti.powernet.ch/webservices/inet/HRG/HRG.asmx/getHRGPDF?chnr=5243009149&amp;amt=501&amp;toBeModified=0&amp;validOnly=0&amp;lang=1&amp;sort=0">PDF</a><p /><b>DEDALE SA</b><i> in <a target="_top" href="/info/ger/VS626.htm">Chermignon</a></i>, AG, <a target="result" href="/WebServices/Zefix/Zefix.asmx/ShowFirm?parId=1139492&amp;parChnr=CH-626.3.014.970-6&amp;language=1">+</a>, <a target="_blank" href="http://vs.powernet.ch/webservices/inet/HRG/HRG.asmx/getHRGHTML?chnr=6263014970&amp;amt=626&amp;toBeModified=0&amp;validOnly=0&amp;lang=1&amp;sort=0">CHE-196.615.628</a><p /><b>Dedale Solutions, Putallaz &amp; Co</b><i> in <a target="_top" href="/info/ger/GE660.htm">Genève</a></i>, Kommanditgesell., <a target="result" href="/WebServices/Zefix/Zefix.asmx/ShowFirm?parId=1049329&amp;parChnr=CH-660.0.412.012-4&amp;language=1">+</a>, <a target="_blank" href="http://ge.ch/hrcintapp/externalCompanyReport.action?companyOfrcId13=CH-660-0412012-4&amp;ofrcLanguage=1">CHE-416.967.677</a><p />Dedalo Promotion Limited Liability Company, Cheyenne, Wyoming USA, succursale di Paradiso<i> in <a target="_top" href="/info/ger/TI501.htm">Paradiso</a></i>, Ausl. ZN, gelöscht: Publ.Dat.  25.05.2010,
         <a target="result" href="/WebServices/Zefix/Zefix.asmx/ShowFirm?parId=349506&amp;parChnr=CH-514.9.009.263-7&amp;language=1">+</a>, <a target="_blank" href="http://ti.powernet.ch/webservices/inet/HRG/HRG.asmx/getHRGHTML?chnr=5149009263&amp;amt=501&amp;toBeModified=0&amp;validOnly=0&amp;lang=1&amp;sort=0">CHE-104.147.677</a>, <a target="_blank" href="http://ti.powernet.ch/webservices/inet/HRG/HRG.asmx/getHRGPDF?chnr=5149009263&amp;amt=501&amp;toBeModified=0&amp;validOnly=0&amp;lang=1&amp;sort=0">PDF</a><p /><b>Dedalo SA</b><i> in <a target="_top" href="/info/ger/TI501.htm">Chiasso</a></i>, AG, <a target="result" href="/WebServices/Zefix/Zefix.asmx/ShowFirm?parId=1144906&amp;parChnr=CH-501.3.017.898-0&amp;language=1">+++</a>, <a target="_blank" href="http://ti.powernet.ch/webservices/inet/HRG/HRG.asmx/getHRGHTML?chnr=5013017898&amp;amt=501&amp;toBeModified=0&amp;validOnly=0&amp;lang=1&amp;sort=0">CHE-226.878.749</a>, <a target="_blank" href="http://ti.powernet.ch/webservices/inet/HRG/HRG.asmx/getHRGPDF?chnr=5013017898&amp;amt=501&amp;toBeModified=0&amp;validOnly=0&amp;lang=1&amp;sort=0">PDF</a><p /><b>Dedalos R&amp;D</b><i> in <a target="_top" href="/info/ger/TI501.htm">Bellinzona</a></i>, Verein, <a target="result" href="/WebServices/Zefix/Zefix.asmx/ShowFirm?parId=431256&amp;parChnr=CH-500.6.004.353-6&amp;language=1">+</a>, <a target="_blank" href="http://ti.powernet.ch/webservices/inet/HRG/HRG.asmx/getHRGHTML?chnr=5006004353&amp;amt=501&amp;toBeModified=0&amp;validOnly=0&amp;lang=1&amp;sort=0">CHE-104.771.605</a>, <a target="_blank" href="http://ti.powernet.ch/webservices/inet/HRG/HRG.asmx/getHRGPDF?chnr=5006004353&amp;amt=501&amp;toBeModified=0&amp;validOnly=0&amp;lang=1&amp;sort=0">PDF</a><p /><b>DEDALUS DIVERS Sagl</b><i> in <a target="_top" href="/info/ger/TI501.htm">Gordola</a></i>, GmbH, <a target="result" href="/WebServices/Zefix/Zefix.asmx/ShowFirm?parId=1107221&amp;parChnr=CH-501.4.016.642-1&amp;language=1">+</a>, <a target="_blank" href="http://ti.powernet.ch/webservices/inet/HRG/HRG.asmx/getHRGHTML?chnr=5014016642&amp;amt=501&amp;toBeModified=0&amp;validOnly=0&amp;lang=1&amp;sort=0">CHE-167.108.200</a>, <a target="_blank" href="http://ti.powernet.ch/webservices/inet/HRG/HRG.asmx/getHRGPDF?chnr=5014016642&amp;amt=501&amp;toBeModified=0&amp;validOnly=0&amp;lang=1&amp;sort=0">PDF</a><p /><b>Dedalus SA</b><i> in <a target="_top" href="/info/ger/TI501.htm">Breggia</a></i>, AG, <a target="result" href="/WebServices/Zefix/Zefix.asmx/ShowFirm?parId=404462&amp;parChnr=CH-524.3.006.007-5&amp;language=1">+++</a>, <a target="_blank" href="http://ti.powernet.ch/webservices/inet/HRG/HRG.asmx/getHRGHTML?chnr=5243006007&amp;amt=501&amp;toBeModified=0&amp;validOnly=0&amp;lang=1&amp;sort=0">CHE-106.145.979</a>, <a target="_blank" href="http://ti.powernet.ch/webservices/inet/HRG/HRG.asmx/getHRGPDF?chnr=5243006007&amp;amt=501&amp;toBeModified=0&amp;validOnly=0&amp;lang=1&amp;sort=0">PDF</a><p /><b>EDIL DEDALO S.A.G.L.</b><i> in <a target="_top" href="/info/ger/TI501.htm">Balerna</a></i>, GmbH, <a target="result" href="/WebServices/Zefix/Zefix.asmx/ShowFirm?parId=1150282&amp;parChnr=CH-501.4.017.854-1&amp;language=1">+++</a>, <a target="_blank" href="http://ti.powernet.ch/webservices/inet/HRG/HRG.asmx/getHRGHTML?chnr=5014017854&amp;amt=501&amp;toBeModified=0&amp;validOnly=0&amp;lang=1&amp;sort=0">CHE-232.905.567</a>, <a target="_blank" href="http://ti.powernet.ch/webservices/inet/HRG/HRG.asmx/getHRGPDF?chnr=5014017854&amp;amt=501&amp;toBeModified=0&amp;validOnly=0&amp;lang=1&amp;sort=0">PDF</a><p /><hr size="5" /></font>
    <script type="text/javascript">
        var _paq = _paq || [];
        _paq.push(['trackPageView']);
        _paq.push(['enableLinkTracking']);

        (function() {
            var u = (("https:" == document.location.protocol) ? "https" : "http") + "://www.e-service.admin.ch/analytics/";
            _paq.push(['setTrackerUrl', u + 'piwik.php']);
            _paq.push(['setSiteId', 4]);
            var d = document,
                g = d.createElement('script'),
                s = d.getElementsByTagName('script')[0];
            g.type = 'text/javascript';
            g.defer = true;
            g.async = true;
            g.src = u + 'piwik.js';
            s.parentNode.insertBefore(g, s);
        })();
    </script>
    <noscript>
        <p>
            <img src="http://www.e-service.admin.ch/analytics/piwik.php?idsite=4" style="border:0;" alt="" />
        </p>
    </noscript>
</body>

</html>

但我不需要所有数据,我使用 Simple_html_dom https://github.com/samacs/simple_html_dom

我得到了我可以处理的问题,唯一的问题是解析该字符串,我需要使用这样的值来获取HTML

<p>COMPANY NAME</p>
<a class="che" href="LINK CHE">CHE</a>
<a class="pdf" href="PDF LINK">PDF</a>

问题是有时没有PDF,我不知道要解析什么:(

0 个答案:

没有答案