Question

我正在为维基百科工作。我尝试使用file_get_contents检索页面https://de.wikipedia.org/wiki/Spezial:Linkliste/Hans_Jansen_(Arabist)。然后我通过查找列表并在\ n。

中将其展开来提取所有列表项

之后我想检索以列表项命名的文章文本。为此我做了

 file_get_contents(https://de.wikipedia.org/w/index.php?action=raw&title=".urlencode($article));

一切顺利，直到名为 Ka＆＃39; b ibn As＆＃39; ad 的文章导致检索

https://de.wikipedia.org/w/index.php?action=raw&title=Ka

当我将文章名称复制为纯文本时，一切顺利：

 $article = "Ka'b ibn As'ad";
 $page = "https://".$server."/w/index.php?action=raw&title=".urlencode($article);

比较手动输入和从网站检索的$ article的urlencode输出显示差异：

  manually; Ka%27b+ibn+As%27ad
  website:  Ka%26%23039%3Bb%20ibn%20As%26%23039%3Bad

将输出与htmlspecialchars（）进行比较更令人印象深刻：

  manually; Ka'b ibn As'ad
  website:  Ka&#039;b ibn As&#039;ad

我如何摆脱那些＆＃39;特殊字符？显然htmlspecialchars_decode（）不起作用。

Answer 1

htmlspecialchars_decode（）仅转换具有名称的html实体，而不是具有数字的实体。您需要使用html-entity-decode()！