我正在抓取一些网站,并且我要将ASCII文本转换为纯文本以存储在数据库中。例如,我想要
I have got to tell anyone who will listen that this is
one of THE best adventure movies I've ever seen.
It's almost impossible to convey how pumped I am
now that I've seen it.
转换为
I have got to tell anyone who will listen that this is
one of THE best adventure movies I've ever seen. It's
almost impossible to convey how pumped I am now that
I've seen it.
我用手指搜索血迹,任何帮助?
答案 0 :(得分:21)
您可以使用html_entity_decode
:
echo html_entity_decode('...', ENT_QUOTES, 'UTF-8');
很少注意到:
请注意,您实际上希望将HTML编码的字符串(使用
等实体)转换为ASCII AKA明文。
此示例转换为UTF-8,它是所有ASCII字符的ASCII兼容字符编码(即char代码低于128)。如果你真的想要纯ASCII(从而丢失所有重音字符和外语字符),你应该单独删除所有违规字符。
最后一个参数('UTF-8')是保持与不同PHP版本兼容的必要条件,因为自PHP 5.4.0以来默认值已经改变。
更新:Example with your text in ideone。
更新2:按@ Daan的建议将ENT_COMPAT更改为ENT_QUOTES。