使用CDATA阻止cURL

时间:2016-06-16 16:49:53

标签: php xml curl cdata

G'Day大家!

这不是另一个带有LIBXML_NOCDATA的CDATA问题,实际上我的cURL(多)请求来源有问题。如果API发回的XML包含CDATA,则curl会停止,并且从CDATA到结尾都不会检索到任何内容。它确实意味着LIBXML_NOCDATA没用,因为检索结果中没有CDATA。

        $endpoint = 'http://theapiservice.com?sentence=';

        $urls = array();
        foreach( $sentences as $key => $value ) {
            $urls[] = $endpoint . $value;
        }

        $multi = curl_multi_init();
        $channels = array();
        $results = array();

        // Loop through the URLs, create curl-handles
        // and attach the handles to our multi-request
        foreach ( $urls as $url ) {
            $ch = curl_init();
            curl_setopt( $ch, CURLOPT_URL, $url );
            curl_setopt( $ch, CURLOPT_HEADER, false );
            curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );

            curl_multi_add_handle( $multi, $ch );

            $channels[$url] = $ch;
        }

        // While we're still active, execute curl
        $active = null;
        do {
            $mrc = curl_multi_exec( $multi, $active );
        } while ( $mrc == CURLM_CALL_MULTI_PERFORM );

        while ( $active && $mrc == CURLM_OK ) {
            // Wait for activity on any curl-connection
            if ( curl_multi_select( $multi ) == -1 ) {
                continue;
            }

            // Continue to exec until curl is ready to
            // give us more data
            do {
                $mrc = curl_multi_exec( $multi, $active );
            } while ( $mrc == CURLM_CALL_MULTI_PERFORM );
        }

        // Loop through the channels and retrieve the received
        // content, then remove the handle from the multi-handle
        foreach ( $channels as $channel ) {
            $results[] = curl_multi_getcontent( $channel );
            curl_multi_remove_handle( $multi, $channel );
        }

        // Close the multi-handle and return our results
        curl_multi_close( $multi );

以下是应该检索的API中的XML。有一个CDATA,这意味着我只会得到之前的,而不是之后。这也是唯一有断线的部分。

<Word>
    <Surface>DATA1</Surface>
    <Sub>DATA1_Sub</Sub>
</Word>
<Word>
    <Surface>)</Surface>
</Word>
<Word>
    <Surface>
    <![CDATA[ ]]>
    </Surface>
</Word>
<Word>
    <Surface>DATA2</Surface>
    <Sub>DATA2_Sub</Sub>
</Word>

我得到了什么(你可以看到没有从CDATA中检索到任何东西):

[23] => SimpleXMLElement Object ( [Surface] => DATA1 [Sub] => DATA1_Sub ) [24] => SimpleXMLElement Object ( [Surface] => ) ) ) ) ) )

当请求正确且没有发回CDATA时,一切都很完美,我可以正常解析XML。您是否应该知道如何至少获得100%的XML而不仅仅是CDATA之前的部分?

谢谢

0 个答案:

没有答案