删除xml输出中的奇怪字符

时间:2014-08-11 11:59:54

标签: xml character-encoding html-entities

我正在尝试删除xml输出中的奇怪字符。这是代码和输出:

似乎存在编码问题。我已经尝试添加这个从ical转换为xml:

http://flourishhosting.co.uk/test.php

    <xml version="1.0" encoding="UTF-8">
            <html xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
            <body style="font-family:Arial;font-size:12pt;background-color:#EEEEEE">
            <xsl:for-each select="VCALENDAR">
              <div style="background-color:teal;color:white;padding:4px">
                <span style="font-weight:bold"><xsl:value-of select="URL"/> - </span>
                <xsl:value-of select="DTSTART"/>
                </div>
              <div style="margin-left:20px;margin-bottom:1em;font-size:10pt">
                <p>
                <xsl:value-of select="SUMMARY"/>
                <span style="font-style:italic"> (<xsl:value-of select="calories"/> calories per serving)</span>
                </p>
              </div>
            </xsl:for-each>
            <?php

            function iCalendarToXML($icalendarData) {

                // Detecting line endings
                if (strpos($icalendarData,"\r\n")) $lb = "\r\n";
                elseif (strpos($icalendarData,"\n")) $lb = "\n";
                else $lb = "\r\n";

                // Splitting up items per line
                $lines = explode($lb,$icalendarData);

                // Properties can be folded over 2 lines. In this case the second
                // line will be preceeded by a space or tab.
                $lines2 = array();
                foreach($lines as $line) {

                    if ($line[0]==" " || $line[0]=="\t") {
                        $lines2[count($lines2)-1].=substr($line,1);
                        continue;
                    }

                    $lines2[]=$line;

                }

                $xml = '<?xml version="1.0"?>' . "\n";

                $spaces = 0;
                foreach($lines2 as $line) {

                    $matches = array();
                    // This matches PROPERTYNAME;ATTRIBUTES:VALUE
                    if (preg_match('/^([^:^;]*)(?:;([^:]*))?:(.*)$/',$line,$matches)) {
                        $propertyName = strtoupper($matches[1]);
                        $attributes = $matches[2];
                        $value = $matches[3];

                        // If the line was in the format BEGIN:COMPONENT or END:COMPONENT, we need to special case it.
                        if ($propertyName == 'BEGIN') {
                            $xml.=str_repeat(" ",$spaces);
                            $xml.='<' . strtoupper($value) . ">\n";
                            $spaces+=2;
                            continue;
                        } elseif ($propertyName == 'END') {
                            $spaces-=2;
                            $xml.=str_repeat(" ",$spaces);
                            $xml.='</' . strtoupper($value) . ">\n";
                            continue;
                        }

                        $xml.=str_repeat(" ",$spaces);
                        $xml.='<' . $propertyName;
                        if ($attributes) {
                            // There can be multiple attributes
                            $attributes = explode(';',$attributes);
                            foreach($attributes as $att) {

                                list($attName,$attValue) = explode('=',$att,2);
                                $xml.=' ' . $attName . '="' . htmlspecialchars($attValue) . '"';

                            }
                        }

                        $xml.='>'. htmlspecialchars($value) . '</' . $propertyName . ">\n";

                    }

                }

                return $xml;

            }
            // read in the artist from the form
            $a = urlencode($_GET["VEVENT"]);
            $var = htmlentities($var,ENT_QUOTES, "Windows-1252");
            $connection = curl_init();


            // Specify the URL to connect to
            curl_setopt($connection, CURLOPT_URL, "http://mosaic-church.onthecity.org/plaza/events/ical_feed");


            // This option ensures that the HTTP response is *returned* from curl_exec(),
            // (see below) rather than being output to screen.  
            curl_setopt($connection,CURLOPT_RETURNTRANSFER,1);

            // Do not include the HTTP header in the response.
            curl_setopt($connection,CURLOPT_HEADER, 0);

            // Actually connect to the remote URL. The response is 
            // returned from curl_exec() and placed in $response.
            $response = curl_exec($connection);

            $xml_output = iCalendarToXML($response);
            echo "XML Output <pre>".$xml_output."</pre>";

            // Close the connection.
            curl_close($connection);

            //parse code:
            $xml = simplexml_load_string($xml_output);
            for($index=0; $index < count($xml->VEVENT); $index++)
            {
              echo $xml->VEVENT[$index]-> SUMMARY . "<br />";
              echo $xml->VEVENT[$index]-> DESCRIPTION . "<br />";
            }
            ?>
            </body>
            </html>

1 个答案:

答案 0 :(得分:1)

您的HTML文档不完整。您缺少整个head标记,该标记应具有指定内容和编码的内容类型元标记。现在,浏览器必须猜测编码是什么,猜测是错误的。

正确的HTML文档的head标记带有title标记,您可以在其中添加元标记:

<head>
  <title>Page name</title>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
</head>