Question

我试图解码WINDOWS-1251中提供的文本我相信。字符串如下所示：

&#1040&#1075&#1077&#1085&#1090

哪个代表俄语中的代理。这是问题所在：

我无法转换此字符串，除非我在每个数字后添加分号
我不能手动完成，因为我要转换10000行文本。

所以问题是，这个编码是什么（没有分号），如何在不破坏代码的情况下自动将它们添加到每一行（正则表达式？）。

到目前为止，我一直试图通过使用此代码来实现此目的：

App Logic

public function parseSentence((array) $sentences, $sentence, $i) {
    if (strstr($sentence, '-')) {
        $sentences[$i] = $this->explodeAndSplit('-', $sentence);
    } else if (strstr($sentence, "'")) {
        $sentences[$i] = $this->explodeAndSplit("'", $sentence);
    } else if (strstr($sentence, "(")) {
        $sentences[$i] = $this->explodeAndSplit("(", $sentence);
    } else if (strstr($sentence, ")")) {
        $sentences[$i] = $this->explodeAndSplit(")", $sentence);
    } else {
        if (strstr($sentence, '#')) {
            $sentences[$i] = chunk_split($sentence, 6, ';');
    }
    return $sentences;
}

/**
 * Explode and Split
 * @param string $explodeBy
 * @param string $string
 *
 * @return string
 */
private function explodeAndSplit($explodeBy, $string) {
    $exp = explode($explodeBy, $string);
    for ($j = 0; $j < count($exp); $j++) {
        $exp[$j] = chunk_split($exp[$j], 6, ';');
    }
    return implode($explodeBy, $exp);
}

但显然，这种方法有点不正确（好吧，完全不正确），因为我没有考虑到许多其他特殊的＆＃39;字符。那怎么能修好呢？

更新
我使用Lumen作为后端，使用AngularJS作为前端。获取在Lumen中解析的所有数据（数据库/文本文件/等），为AngularJS提供所谓的API路由以访问和检索数据。事实是，如果直接访问，这个无分号编码在任何浏览器中都能很好地工作，但由于缺少分号而无法在Angular中显示

Answer 1

这些是Russian HTML Codes (Cyrillic)。为确保正确显示，您需要适当的content-type已应用：

<meta http-equiv="content-type" content="text/html;charset=utf-8" />

现在要正确执行此操作，您需要preg_split()上面的HTML代码字符串，相应地：

array_filter(preg_split("/[&#]+/", $str));

^{array_filter()只删除所有空值。您也可以最终使用explode()来做同样的事情。}

这将返回您拥有的数字数组。从那里开始，一个简单的implode()包含所需的前缀&#和附加的;很简单：

echo '&#' .implode( ";&#", array_filter(preg_split("/[&#]+/", $str) )) . ';';

返回：

&#1040;&#1075;&#1077;&#1085;&#1090;

现在，当生成正确的HTML时，它会显示以下俄语文本：

Агент

在俄语中直接翻译为Agent。

编码中没有分号

1 个答案: