Python中的文本处理-如何处理无效的字符串

时间:2018-10-18 02:09:19

标签: python dataframe encoding decoding

我正在研究文本分类。我看到无效字符,如下所示。有人可以帮助我如何将这些字符解码为实际值。任何指针也应该有所帮助。

"It wouldn\'t take much to do for **Ã\x86sop**,\n\n\n\n\n            would it?**â\x80\x9d** whispered Ivan to Alyosha.\n\n\n\n\n\n\n\n\n\n            **â\x80\x9c**God forbid!**â\x80\x9d** cried Alyosha.\n\n\n\n\n\n\n\n\n\n            **â\x80\x9c**Why should He forbid?**â\x80\x9d** Ivan went on in the\n\n\n\n\n            same whisper, with a malignant grimace. **â\x80\x9c**One reptile will devour the other., And serve them\n\n\n\n\n            both right, too.â\x80\x9d\n\n\n\n\n\n\n\n\n\n            Alyosha\n\n\n\n\n            shuddered.\n\n\n\n\n\n\n\n\n\n            â\x80\x9cOf course I won\'t let him be murdered as I didn\'t\n\n\n\n\n            just now., Stay here, Alyosha, I\'ll go for a turn in the yard., My\n\n\n\n\n            head\'s begun to ache.â\x80\x9d\n\n\n\n\n\n\n\n\n\n            Alyosha went\n\n\n\n\n            to his father\'s bedroom and sat by his bedside behind the screen\n\n\n\n\n            for about an hour., The old man suddenly opened his eyes and gazed\n\n\n\n\n            for a long while at Alyosha, evidently remembering and\n\n\n\n\n            meditating., All at once his face betrayed extraordinary\n\n\n\n\n            excitement.\n\n\n\n\n\n\n\n\n\n            â\x80\x9cAlyosha,â\x80\x9d he whispered apprehensively,\n\n\n\n\n            â\x80\x9cwhere\'s Ivan?â\x80\x9d\n\n\n\n\n\n\n\n\n\n            â\x80\x9cIn the yard., He\'s got a headache., He\'s on the\n\n\n\n\n            watch.â\x80\x9d\n\n\n\n\n\n\n\n\n\n            â\x80\x9cGive me that looking-glass., It stands over there.\n\n\n\n\n            Give it me.â\x80\x9d\n\n\n\n\n\n\n\n\n\n            Alyosha gave\n\n\n\n\n            him a little round folding looking-glass which stood on the chest\n\n\n\n\n            of drawers., The old man looked at himself in it; his nose was\n\n\n\n\n            considerably swollen, and on the left side of his forehead there\n\n\n\n\n            was a rather large crimson bruise.\n\n\n\n\n\n\n\n\n\n            â\x80\x9cWhat does Ivan say?

1 个答案:

答案 0 :(得分:1)

数据似乎已被双重编码(您使用的是Python2吗?)。可以通过将其编码为latin-1,然后从UTF-8进行解码来解决此问题。

>>> data.encode('latin-1').decode('utf-8')
"It wouldn't take much to do for **Æsop**,\n\n\n\n\n            would it?**”** whispered Ivan to Alyosha.\n\n\n\n\n\n\n\n\n\n            **“**God forbid!**”** cried Alyosha.\n\n\n\n\n\n\n\n\n\n            **“**Why should He forbid?**”** Ivan went on in the\n\n\n\n\n            same whisper, with a malignant grimace. **“**One reptile will devour the other., And serve them\n\n\n\n\n            both right, too.”\n\n\n\n\n\n\n\n\n\n            Alyosha\n\n\n\n\n            shuddered.\n\n\n\n\n\n\n\n\n\n            “Of course I won't let him be murdered as I didn't\n\n\n\n\n            just now., Stay here, Alyosha, I'll go for a turn in the yard., My\n\n\n\n\n            head's begun to ache.”\n\n\n\n\n\n\n\n\n\n            Alyosha went\n\n\n\n\n            to his father's bedroom and sat by his bedside behind the screen\n\n\n\n\n            for about an hour., The old man suddenly opened his eyes and gazed\n\n\n\n\n            for a long while at Alyosha, evidently remembering and\n\n\n\n\n            meditating., All at once his face betrayed extraordinary\n\n\n\n\n            excitement.\n\n\n\n\n\n\n\n\n\n            “Alyosha,” he whispered apprehensively,\n\n\n\n\n            “where's Ivan?”\n\n\n\n\n\n\n\n\n\n            “In the yard., He's got a headache., He's on the\n\n\n\n\n            watch.”\n\n\n\n\n\n\n\n\n\n            “Give me that looking-glass., It stands over there.\n\n\n\n\n            Give it me.”\n\n\n\n\n\n\n\n\n\n            Alyosha gave\n\n\n\n\n            him a little round folding looking-glass which stood on the chest\n\n\n\n\n            of drawers., The old man looked at himself in it; his nose was\n\n\n\n\n            considerably swollen, and on the left side of his forehead there\n\n\n\n\n            was a rather large crimson bruise.\n\n\n\n\n\n\n\n\n\n            “What does Ivan say?"