UTF-8网站内的Windows-1251文件?

时间:2009-11-03 20:37:24

标签: php encoding utf-8

大家好,Web开发大师:) 我有一段PHP脚本,可以从我的winamp中获取最近10首播放的歌曲。这个脚本是内部文件(让我们称之为“lastplayed.php”),它包含在我的网站中,包含在“div”中的php include函数。 我的网站采用UTF-8编码。问题是某些歌曲标题采用Windows-1251编码。在我的网站上,它们显示为“ ”...... 有没有任何已知的方法告诉这个div,其中包含“lastplayed.php”,与windows-1251编码? 还是其他任何建议?

P.S:带有脚本a.k.a.“lastplayed.php”的文件被转换为UTF-8。但如果它是ANCII,那就是同样的结果。我尝试在头标签之间添加带有windows-1251的meta标签,但没有任何反复发生。

P.P.S:获取Winamp数据的脚本(lastplayed.php):

<?php
/******
* You may use and/or modify this script as long as you:
* 1. Keep my name & webpage mentioned
* 2. Don't use it for commercial purposes
*
* If you want to use this script without complying to the rules above, please contact me first at: marty@excudo.net
* 
* Author: Martijn Korse
* Website: http://devshed.excudo.net
*
* Date:  08-05-2006
***/

/**
 * version 2.0
 */
class Radio
{
    var $fields = array();
    var $fieldsDefaults = array("Server Status", "Stream Status", "Listener Peak", "Average Listen Time", "Stream Title", "Content Type", "Stream Genre", "Stream URL", "Current Song");
    var $very_first_str;
    var $domain, $port, $path;
    var $errno, $errstr;
    var $trackLists = array();
    var $isShoutcast;
    var $nonShoutcastData = array(
                    "Server Status"     => "n/a",
                    "Stream Status"     => "n/a",
                    "Listener Peak"     => "n/a",
                    "Average Listen Time"   => "n/a",
                    "Stream Title"      => "n/a",
                    "Content Type"      => "n/a",
                    "Stream Genre"      => "n/a",
                    "Stream URL"        => "n/a",
                    "Stream AIM"        => "n/a",
                    "Stream IRC"        => "n/a",
                    "Current Song"      => "n/a"
                    );
    var $altServer = False;

    function Radio($url)
    {
        $parsed_url = parse_url($url);
        $this->domain   = isset($parsed_url['host']) ? $parsed_url['host'] : "";
        $this->port = !isset($parsed_url['port']) || empty($parsed_url['port']) ? "80" : $parsed_url['port'];
        $this->path = empty($parsed_url['path']) ? "/" : $parsed_url['path'];

        if (empty($this->domain))
        {
            $this->domain = $this->path;
            $this->path = "";
        }

        $this->setOffset("Current Stream Information");
        $this->setFields();     // setting default fields

        $this->setTableStart("<table border=0 cellpadding=2 cellspacing=2>");
        $this->setTableEnd("</table>");
    }

    function setFields($array=False)
    {
        if (!$array)
            $this->fields = $this->fieldsDefaults;
        else
            $this->fields = $array;
    }
    function setOffset($string)
    {
        $this->very_first_str = $string;
    }
    function setTableStart($string)
    {
        $this->tableStart = $string;
    }
    function setTableEnd($string)
    {
        $this->tableEnd = $string;
    }

    function getHTML($page=False)
    {
        if (!$page)
            $page = $this->path;
        $contents = "";
        $domain = (substr($this->domain, 0, 7) == "http://") ? substr($this->domain, 7) : $this->domain;


        if (@$fp = fsockopen($domain, $this->port, $this->errno, $this->errstr, 2))
        {
            fputs($fp, "GET ".$page." HTTP/1.1\r\n".
                "User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)\r\n".
                "Accept: */*\r\n".
                "Host: ".$domain."\r\n\r\n");

            $c = 0;
            while (!feof($fp) && $c <= 20)
            {
                $contents .= fgets($fp, 4096);
                $c++;
            }

            fclose ($fp);

            preg_match("/(Content-Type:)(.*)/i", $contents, $matches);
            if (count($matches) > 0)
            {
                $contentType = trim($matches[2]);
                if ($contentType == "text/html")
                {
                    $this->isShoutcast = True;
                    return $contents;
                }
                else
                {
                    $this->isShoutcast = False;

                    $htmlContent = substr($contents, 0, strpos($contents, "\r\n\r\n"));

                    $dataStr = str_replace("\r", "\n", str_replace("\r\n", "\n", $contents));
                    $lines = explode("\n", $dataStr);
                    foreach ($lines AS $line)
                    {
                        if ($dp = strpos($line, ":"))
                        {
                            $key = substr($line, 0, $dp);
                            $value = trim(substr($line, ($dp+1)));
                            if (preg_match("/genre/i", $key))
                                $this->nonShoutcastData['Stream Genre'] = $value;
                            if (preg_match("/name/i", $key))
                                $this->nonShoutcastData['Stream Title'] = $value;
                            if (preg_match("/url/i", $key))
                                $this->nonShoutcastData['Stream URL'] = $value;
                            if (preg_match("/content-type/i", $key))
                                $this->nonShoutcastData['Content Type'] = $value;
                            if (preg_match("/icy-br/i", $key))
                                $this->nonShoutcastData['Stream Status'] = "Stream is up at ".$value."kbps";
                            if (preg_match("/icy-notice2/i", $key))
                            {
                                $this->nonShoutcastData['Server Status'] = "This is <span style=\"color: red;\">not</span> a Shoutcast server!";
                                if (preg_match("/ultravox/i", $value))
                                    $this->nonShoutcastData['Server Status'] .= " But an <a href=\"http://ultravox.aol.com/\" target=\"_blank\">Ultravox</a> Server";
                                $this->altServer = $value;
                            }
                        }
                    }
                    return nl2br($htmlContent);
                }
            }
            else
                return $contents;
        }
        else
        {
            return False;
        }
    }

    function getServerInfo($display_array=null, $very_first_str=null)
    {
        if (!isset($display_array))
            $display_array = $this->fields;
        if (!isset($very_first_str))
            $very_first_str = $this->very_first_str;

        if ($html = $this->getHTML())
        {
             // parsing the contents
            $data = array();
            foreach ($display_array AS $key => $item)
            {
                if ($this->isShoutcast)
                {
                    $very_first_pos = stripos($html, $very_first_str);
                    $first_pos  = stripos($html, $item, $very_first_pos);
                    $line_start = strpos($html, "<td>", $first_pos);
                    $line_end   = strpos($html, "</td>", $line_start) + 4;
                    $difference = $line_end - $line_start;
                    $line       = substr($html, $line_start, $difference);
                    $data[$key] = strip_tags($line);
                }
                else
                {
                    $data[$key] = $this->nonShoutcastData[$item];
                }
            }
            return $data;
        }
        else
        {
            return $this->errstr." (".$this->errno.")";
        }
    }

    function createHistoryArray($page)
    {
        if (!in_array($page, $this->trackLists))
        {
            $this->trackLists[] = $page;
            if ($html = $this->getHTML($page))
            {
                $fromPos    = stripos($html, $this->tableStart);
                $toPos      = stripos($html, $this->tableEnd, $fromPos);
                $tableData  = substr($html, $fromPos, ($toPos-$fromPos));
                $lines      = explode("</tr><tr>", $tableData);
                $tracks = array();
                $c = 0;
                foreach ($lines AS $line)
                {
                    $info = explode ("</td><td>", $line);
                    $time = trim(strip_tags($info[0]));
                    if (substr($time, 0, 9) != "Copyright" && !preg_match("/Tag Loomis, Tom Pepper and Justin Frankel/i", $info[1]))
                    {
                        $this->tracks[$c]['time'] = $time;
                        $this->tracks[$c++]['track'] = trim(strip_tags($info[1]));
                    }
                }
                if (count($this->tracks) > 0)
                {
                    unset($this->tracks[0]);
                    if (isset($this->tracks[1]))
                        $this->tracks[1]['track'] = str_replace("Current Song", "", $this->tracks[1]['track']);
                }
            }
            else
            {
                $this->tracks[0] = array("time"=>$this->errno, "track"=>$this->errstr);
            }
        }
    }
    function getHistoryArray($page="/played.html")
    {
        if (!in_array($page, $this->trackLists))
            $this->createHistoryArray($page);
        return $this->tracks;
    }
    function getHistoryTable($page="/played.html", $trackColText=False, $class=False)
    {
        $title_utf8 = mb_convert_encoding($trackArr ,"utf-8" ,"auto");

        if (!in_array($page, $this->trackLists))
            $this->createHistoryArray($page);
        if ($trackColText)
            $output .= "
            <div class='lastplayed_top'></div>
            <div".($class ? " class=\"".$class."\"" : "").">";
        foreach ($this->tracks AS $title_utf8)
            $output .= "<div style='padding:2px 0;'>".$title_utf8['track']."</div>";
        $output .= "</div><div class='lastplayed_bottom'></div>
        <div class='lastplayed_title'>".$trackColText."</div>
        \n";
        return $output;
    }
}

 // this is needed for those with a php version < 5
 // the function is copied from the user comments @ php.net (http://nl3.php.net/stripos)
if (!function_exists("stripos"))
{
    function stripos($haystack, $needle, $offset=0)
    {
        return strpos(strtoupper($haystack), strtoupper($needle), $offset);
    }
}
?>

在lastplayed.php之外的调用脚本:

include "lastplayed.php";
$radio = new Radio($ip.":".$port);
echo $radio->getHistoryTable("/played.html", "<b>Last played:</b>", "lastplayed_content");

2 个答案:

答案 0 :(得分:5)

如果所有源数据都在windows-1251中,您可以使用以下内容:

$title_utf8=mb_convert_encoding($title,"utf-8","Windows-1251")

并将转换后的数据放入HTML流中。

由于我只关注文档,因此我不能100%确定源编码别名是正确的;如果Windows-1251不起作用,您可能需要尝试使用CP1251。

如果你的源数据在1251年不可靠,你将不得不想出一个启发式猜测,并使用相同的转换方法。 mb_detect_encoding可以帮到你。

您无法更改HTML文档的部分编码,但您可以轻松地将所有内容转换为UTF-8。

较新的ID3实现在其文本框架中具有编码标记:

$00 ISO-8859-1 (ASCII)
$01 – UCS-2 in ID3v2.2 and ID3v2.3, UTF-16 encoded Unicode with BOM. 
$02 – UTF-16BE encoded Unicode without BOM in ID3v2.4 only.
$03 – UTF-8 encoded Unicode in ID3v2.4 only.

您的内容是否可能是UTF16?

根据您发布的代码,目前尚不清楚$ trackArr是如何定义的,因为它没有在其他地方引用。看起来你有几个问题。

$title_utf8 = mb_convert_encoding($trackArr ,"utf-8" ,"auto")

“auto”扩展为不包含Windows-1251的编码列表,所以我不确定你为什么使用它。你真的应该使用“Windows-1251”。我已经尝试在安装了PHP的Mac上使用“Windows-1251,utf-16”,但是自动检测无法针对相对较短的字符串找到合适的编码,所以看起来你将不得不成为猜测的那个

但是,当您使用迭代覆盖值时,该代码看起来似乎没有任何理由存在:

    foreach ($this->tracks AS $title_utf8)
            $output .= "<div style='padding:2px 0;'>".$title_utf8['track'].\"</div>";

在每次迭代中,变量$title_utf8被分配给当前轨道。您可能想要的更像是:

    foreach ($this->tracks AS $current_track)
            $output .= "<div style='padding:2px 0;'>". mb_convert_encoding($current_track ,"utf-8" ,"Windows-1251");

mb_convert_encoding将字符串作为第一个参数,而不是数组或对象,因此您需要对每个不是utf-8的字符串应用此编码。

答案 1 :(得分:0)

只是为了让您知道最新版本支持字符编码/解码: - )