在PHP中将ASCII转换为纯文本

时间:2012-05-15 07:00:50

标签: php html ascii plaintext

我正在抓取一些网站,并且我要将ASCII文本转换为纯文本以存储在数据库中。例如,我想要

I have got to tell anyone who will listen that this is
one of THE best adventure movies I've ever seen.
It's almost impossible to convey how pumped I am
now that I've seen it.

转换为

I have got to tell anyone who will listen that this is
one of THE best adventure movies I've ever seen. It's
almost impossible to convey how pumped I am now that
I've seen it.

我用手指搜索血迹,任何帮助?

1 个答案:

答案 0 :(得分:21)

您可以使用html_entity_decode

echo html_entity_decode('...', ENT_QUOTES, 'UTF-8');

很少注意到:

  • 请注意,您实际上希望将HTML编码的字符串(使用等实体)转换为ASCII AKA明文。

  • 此示例转换为UTF-8,它是所有ASCII字符的ASCII兼容字符编码(即char代码低于128)。如果你真的想要纯ASCII(从而丢失所有重音字符和外语字符),你应该单独删除所有违规字符。

  • 最后一个参数('UTF-8')是保持与不同PHP版本兼容的必要条件,因为自PHP 5.4.0以来默认值已经改变。

更新:Example with your text in ideone

更新2:按@ Daan的建议将ENT_COMPAT更改为ENT_QUOTES。