python ast SyntaxError:语法未知原因无效

时间:2019-05-11 17:57:12

标签: python python-3.x parsing dictionary abstract-syntax-tree

我有一个我无法控制或编写的字符串,但需要使用ast进行解析。 ast无法处理。但是我需要知道为什么吗?以及如何解决这个问题?

这是我的代码:

import ast

mystring = "https://111.com<xx>{'Server': 'openresty', 'Date': 'Fri, 19 Apr 2019 07:31:18 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding, Accept-Encoding', 'X-Rid': '5cbcdcf186159173e59ed3463f0b6ff3', 'P3p': 'CP="Tumblr's privacy policy is available here: https://www.tumblr.com/policy/en/privacy"', 'X-Xss-Protection': '1; mode=block', 'X-Content-Type-Options': 'nosniff', 'X-Tumblr-User': 'the-absolute-best-posts', 'X-Tumblr-Pixel-0': 'https://px.srvcs.tumblr.com/impixu?T=1555659077&J=eyJ0eXBlIjoidXJsIiwidXJsIjoiaHR0cDovLzEwMDBub3Rlcy5jb20vIiwicmVxdHlwZSI6MCwicm91dGUiOiIvIn0=&U=EJEPFDDMDN&K=36533553ca6c98c3ffa40d15855478b3c1f427a30be7a5eb4cd09256b4cd31a7--https://px.srvcs.tumblr.com/impixu?T=1555659077&J=eyJ0eXBlIjoicG9zdCIsInVybCI6Imh0dHA6Ly8xMDAwbm90ZXMuY29tLyIsInJlcXR5cGUiOjAsInJvdXRlIjoiLyIsInBvc3RzIjpbeyJyb290X2Jsb2dpZCI6IjE4NjcxODA4Iiwicm9vdF9wb3N0aWQiOiIxMjMzNTczMjcxMSIsInBvc3RpZCI6IjE4NDI5MDIxOTE4OCIsImJsb2dp', 'X-Tumblr-Pixel-1': 'ZCI6IjE5MzQzMzciLCJzb3VyY2UiOjMzfSx7InJvb3RfYmxvZ2lkIjoiMjE0ODQzNTciLCJyb290X3Bvc3RpZCI6OTIyNTc2MTQxMTUsInBvc3RpZCI6IjE4NDI5MDAxNDkyOCIsImJsb2dpZCI6IjE5MzQzMzciLCJzb3VyY2UiOjMzfSx7InJvb3RfYmxvZ2lkIjoiNzMyMzk2NSIsInJvb3RfcG9zdGlkIjoiMTEwMjk5NjAyNDU2IiwicG9zdGlkIjoiMTg0Mjg5Nzk5OTY4IiwiYmxvZ2lkIjoiMTkzNDMzNyIsInNvdXJjZSI6MzN9LHsicm9vdF9ibG9naWQiOiI5NDg5NzU4Iiwicm9vdF9wb3N0aWQiOjQ1MzczNTgwMDc0LCJwb3N0aWQiOiIxODQyODk1NjkzNDgiLCJibG9naWQiOiIxOTM0MzM3Iiwic291cmNlIjozM30seyJyb290X2Jsb2dpZCI6Ij', 'X-Tumblr-Pixel-2': 'ExMTY2NTU5Iiwicm9vdF9wb3N0aWQiOiIxMjgyMDQ1MzczMTUiLCJwb3N0aWQiOiIxODQyODkzMjcyNTMiLCJibG9naWQiOiIxOTM0MzM3Iiwic291cmNlIjozM30seyJyb290X2Jsb2dpZCI6IjU2NjI0Mzg2Iiwicm9vdF9wb3N0aWQiOjQyMzY1MDI4NDM1LCJwb3N0aWQiOiIxODQyODkwNjgzNjMiLCJibG9naWQiOiIxOTM0MzM3Iiwic291cmNlIjozM30seyJyb290X2Jsb2dpZCI6IjcyMjc2MzI5Iiwicm9vdF9wb3N0aWQiOiIxMzc0ODA5NjA2NjciLCJwb3N0aWQiOiIxODQyODg4MDczNDMiLCJibG9naWQiOiIxOTM0MzM3Iiwic291cmNlIjozM31dfQ==&U=LFCMCFJHJM&K=600602e5cc72d6cd698d22736e67fbcabad4b929587949814298', 'X-Tumblr-Pixel-3': '57d2040fa3bb', 'X-Tumblr-Pixel': '4', 'Link': '\<https://66.media.tumblr.com/avatar_6d0dd0685eab_128.pnj>; rel=icon', 'X-UA-Compatible': 'IE=Edge,chrome=1', 'Content-Encoding': 'gzip'}"

h = mystring.split("<xx>",1)[1]
h = ast.literal_eval(h.strip())

我得到的是无效的语法错误:

>   File "test.py", line 3
>     mystring = "https://111.com<xx>{'Server': 'openresty', 'Date': 'Fri, 19 Apr 2019 07:31:18 GMT', 'Content-Type': 'text/html;
> charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection':
> 'keep-alive', 'Vary': 'Accept-Encoding, Accept-Encoding', 'X-Rid':
> '5cbcdcf186159173e59ed3463f0b6ff3', 'P3p': 'CP="Tumblr's privacy
> policy is available here: https://www.tumblr.com/policy/en/privacy"',
> 'X-Xss-Protection': '1; mode=block', 'X-Content-Type-Options':
> 'nosniff', 'X-Tumblr-User': 'the-absolute-best-posts',
> 'X-Tumblr-Pixel-0':
> 'https://px.srvcs.tumblr.com/impixu?T=1555659077&J=eyJ0eXBlIjoidXJsIiwidXJsIjoiaHR0cDovLzEwMDBub3Rlcy5jb20vIiwicmVxdHlwZSI6MCwicm91dGUiOiIvIn0=&U=EJEPFDDMDN&K=36533553ca6c98c3ffa40d15855478b3c1f427a30be7a5eb4cd09256b4cd31a7--https://px.srvcs.tumblr.com/impixu?T=1555659077&J=eyJ0eXBlIjoicG9zdCIsInVybCI6Imh0dHA6Ly8xMDAwbm90ZXMuY29tLyIsInJlcXR5cGUiOjAsInJvdXRlIjoiLyIsInBvc3RzIjpbeyJyb290X2Jsb2dpZCI6IjE4NjcxODA4Iiwicm9vdF9wb3N0aWQiOiIxMjMzNTczMjcxMSIsInBvc3RpZCI6IjE4NDI5MDIxOTE4OCIsImJsb2dp',
> 'X-Tumblr-Pixel-1':
> 'ZCI6IjE5MzQzMzciLCJzb3VyY2UiOjMzfSx7InJvb3RfYmxvZ2lkIjoiMjE0ODQzNTciLCJyb290X3Bvc3RpZCI6OTIyNTc2MTQxMTUsInBvc3RpZCI6IjE4NDI5MDAxNDkyOCIsImJsb2dpZCI6IjE5MzQzMzciLCJzb3VyY2UiOjMzfSx7InJvb3RfYmxvZ2lkIjoiNzMyMzk2NSIsInJvb3RfcG9zdGlkIjoiMTEwMjk5NjAyNDU2IiwicG9zdGlkIjoiMTg0Mjg5Nzk5OTY4IiwiYmxvZ2lkIjoiMTkzNDMzNyIsInNvdXJjZSI6MzN9LHsicm9vdF9ibG9naWQiOiI5NDg5NzU4Iiwicm9vdF9wb3N0aWQiOjQ1MzczNTgwMDc0LCJwb3N0aWQiOiIxODQyODk1NjkzNDgiLCJibG9naWQiOiIxOTM0MzM3Iiwic291cmNlIjozM30seyJyb290X2Jsb2dpZCI6Ij',
> 'X-Tumblr-Pixel-2':
> 'ExMTY2NTU5Iiwicm9vdF9wb3N0aWQiOiIxMjgyMDQ1MzczMTUiLCJwb3N0aWQiOiIxODQyODkzMjcyNTMiLCJibG9naWQiOiIxOTM0MzM3Iiwic291cmNlIjozM30seyJyb290X2Jsb2dpZCI6IjU2NjI0Mzg2Iiwicm9vdF9wb3N0aWQiOjQyMzY1MDI4NDM1LCJwb3N0aWQiOiIxODQyODkwNjgzNjMiLCJibG9naWQiOiIxOTM0MzM3Iiwic291cmNlIjozM30seyJyb290X2Jsb2dpZCI6IjcyMjc2MzI5Iiwicm9vdF9wb3N0aWQiOiIxMzc0ODA5NjA2NjciLCJwb3N0aWQiOiIxODQyODg4MDczNDMiLCJibG9naWQiOiIxOTM0MzM3Iiwic291cmNlIjozM31dfQ==&U=LFCMCFJHJM&K=600602e5cc72d6cd698d22736e67fbcabad4b929587949814298',
> 'X-Tumblr-Pixel-3': '57d2040fa3bb', 'X-Tumblr-Pixel': '4', 'Link':
> '\<https://66.media.tumblr.com/avatar_6d0dd0685eab_128.pnj>;
> rel=icon', 'X-UA-Compatible': 'IE=Edge,chrome=1', 'Content-Encoding':
> 'gzip'}"
> 
> 
> 
>     ^ SyntaxError: invalid syntax

3 个答案:

答案 0 :(得分:0)

您收到语法错误,因为字符串/ json包含双引号,这会过早结束字符串的声明。结果,Python尝试用有效的python代码解释其余部分,而实际上不是。

这部分似乎是问题所在:

3p': 'CP="Tumblr's privacy policy is available...

答案 1 :(得分:0)

尝试使用这种语法

import ast

mystring = "https://111.com<xx>{'Server': 'openresty', 'Date': 'Fri, 19 Apr 2019 07:31:18 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding, Accept-Encoding', 'X-Rid': '5cbcdcf186159173e59ed3463f0b6ff3', 'P3p': 'CP='Tumblr\'s privacy policy is available here: https://www.tumblr.com/policy/en/privacy', 'X-Xss-Protection': '1; mode=block', 'X-Content-Type-Options': 'nosniff', 'X-Tumblr-User': 'the-absolute-best-posts', 'X-Tumblr-Pixel-0': 'https://px.srvcs.tumblr.com/impixu?T=1555659077&J=eyJ0eXBlIjoidXJsIiwidXJsIjoiaHR0cDovLzEwMDBub3Rlcy5jb20vIiwicmVxdHlwZSI6MCwicm91dGUiOiIvIn0=&U=EJEPFDDMDN&K=36533553ca6c98c3ffa40d15855478b3c1f427a30be7a5eb4cd09256b4cd31a7--https://px.srvcs.tumblr.com/impixu?T=1555659077&J=eyJ0eXBlIjoicG9zdCIsInVybCI6Imh0dHA6Ly8xMDAwbm90ZXMuY29tLyIsInJlcXR5cGUiOjAsInJvdXRlIjoiLyIsInBvc3RzIjpbeyJyb290X2Jsb2dpZCI6IjE4NjcxODA4Iiwicm9vdF9wb3N0aWQiOiIxMjMzNTczMjcxMSIsInBvc3RpZCI6IjE4NDI5MDIxOTE4OCIsImJsb2dp', 'X-Tumblr-Pixel-1': 'ZCI6IjE5MzQzMzciLCJzb3VyY2UiOjMzfSx7InJvb3RfYmxvZ2lkIjoiMjE0ODQzNTciLCJyb290X3Bvc3RpZCI6OTIyNTc2MTQxMTUsInBvc3RpZCI6IjE4NDI5MDAxNDkyOCIsImJsb2dpZCI6IjE5MzQzMzciLCJzb3VyY2UiOjMzfSx7InJvb3RfYmxvZ2lkIjoiNzMyMzk2NSIsInJvb3RfcG9zdGlkIjoiMTEwMjk5NjAyNDU2IiwicG9zdGlkIjoiMTg0Mjg5Nzk5OTY4IiwiYmxvZ2lkIjoiMTkzNDMzNyIsInNvdXJjZSI6MzN9LHsicm9vdF9ibG9naWQiOiI5NDg5NzU4Iiwicm9vdF9wb3N0aWQiOjQ1MzczNTgwMDc0LCJwb3N0aWQiOiIxODQyODk1NjkzNDgiLCJibG9naWQiOiIxOTM0MzM3Iiwic291cmNlIjozM30seyJyb290X2Jsb2dpZCI6Ij', 'X-Tumblr-Pixel-2': 'ExMTY2NTU5Iiwicm9vdF9wb3N0aWQiOiIxMjgyMDQ1MzczMTUiLCJwb3N0aWQiOiIxODQyODkzMjcyNTMiLCJibG9naWQiOiIxOTM0MzM3Iiwic291cmNlIjozM30seyJyb290X2Jsb2dpZCI6IjU2NjI0Mzg2Iiwicm9vdF9wb3N0aWQiOjQyMzY1MDI4NDM1LCJwb3N0aWQiOiIxODQyODkwNjgzNjMiLCJibG9naWQiOiIxOTM0MzM3Iiwic291cmNlIjozM30seyJyb290X2Jsb2dpZCI6IjcyMjc2MzI5Iiwicm9vdF9wb3N0aWQiOiIxMzc0ODA5NjA2NjciLCJwb3N0aWQiOiIxODQyODg4MDczNDMiLCJibG9naWQiOiIxOTM0MzM3Iiwic291cmNlIjozM31dfQ==&U=LFCMCFJHJM&K=600602e5cc72d6cd698d22736e67fbcabad4b929587949814298', 'X-Tumblr-Pixel-3': '57d2040fa3bb', 'X-Tumblr-Pixel': '4', 'Link': '\<https://66.media.tumblr.com/avatar_6d0dd0685eab_128.pnj>; rel=icon', 'X-UA-Compatible': 'IE=Edge,chrome=1', 'Content-Encoding': 'gzip'}"

h = mystring.split("<xx>",1)[1]
h = ast.literal_eval(h.strip())

您对'P3p': 'CP="Tumblr's ....上的双引号有疑问

答案 2 :(得分:0)

尝试一下

print(df)
   index location_name  location_id location_country
0      1     kalaymous          100              GRC
1      2     kalaymous          100              GRC
2      3     kalaymous          100              GRC
3      4     kalaymous          100              GRC
4      5   clear_creek          300              USA
5      6   clear_creek          300              USA
6      7   clear_creek          300              USA
7      8   clear_creek          300              USA
8      9   clear_creek          300              USA

您的引号中有import ast mystring = "https://111.com<xx>{'Server': 'openresty', 'Date': 'Fri, 19 Apr 2019 07:31:18 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding, Accept-Encoding', 'X-Rid': '5cbcdcf186159173e59ed3463f0b6ff3', 'P3p': 'CP=\"Tumblr\\'s privacy policy is available here: https://www.tumblr.com/policy/en/privacy\"', 'X-Xss-Protection': '1; mode=block', 'X-Content-Type-Options': 'nosniff', 'X-Tumblr-User': 'the-absolute-best-posts', 'X-Tumblr-Pixel-0': 'https://px.srvcs.tumblr.com/impixu?T=1555659077&J=eyJ0eXBlIjoidXJsIiwidXJsIjoiaHR0cDovLzEwMDBub3Rlcy5jb20vIiwicmVxdHlwZSI6MCwicm91dGUiOiIvIn0=&U=EJEPFDDMDN&K=36533553ca6c98c3ffa40d15855478b3c1f427a30be7a5eb4cd09256b4cd31a7--https://px.srvcs.tumblr.com/impixu?T=1555659077&J=eyJ0eXBlIjoicG9zdCIsInVybCI6Imh0dHA6Ly8xMDAwbm90ZXMuY29tLyIsInJlcXR5cGUiOjAsInJvdXRlIjoiLyIsInBvc3RzIjpbeyJyb290X2Jsb2dpZCI6IjE4NjcxODA4Iiwicm9vdF9wb3N0aWQiOiIxMjMzNTczMjcxMSIsInBvc3RpZCI6IjE4NDI5MDIxOTE4OCIsImJsb2dp', 'X-Tumblr-Pixel-1': 'ZCI6IjE5MzQzMzciLCJzb3VyY2UiOjMzfSx7InJvb3RfYmxvZ2lkIjoiMjE0ODQzNTciLCJyb290X3Bvc3RpZCI6OTIyNTc2MTQxMTUsInBvc3RpZCI6IjE4NDI5MDAxNDkyOCIsImJsb2dpZCI6IjE5MzQzMzciLCJzb3VyY2UiOjMzfSx7InJvb3RfYmxvZ2lkIjoiNzMyMzk2NSIsInJvb3RfcG9zdGlkIjoiMTEwMjk5NjAyNDU2IiwicG9zdGlkIjoiMTg0Mjg5Nzk5OTY4IiwiYmxvZ2lkIjoiMTkzNDMzNyIsInNvdXJjZSI6MzN9LHsicm9vdF9ibG9naWQiOiI5NDg5NzU4Iiwicm9vdF9wb3N0aWQiOjQ1MzczNTgwMDc0LCJwb3N0aWQiOiIxODQyODk1NjkzNDgiLCJibG9naWQiOiIxOTM0MzM3Iiwic291cmNlIjozM30seyJyb290X2Jsb2dpZCI6Ij', 'X-Tumblr-Pixel-2': 'ExMTY2NTU5Iiwicm9vdF9wb3N0aWQiOiIxMjgyMDQ1MzczMTUiLCJwb3N0aWQiOiIxODQyODkzMjcyNTMiLCJibG9naWQiOiIxOTM0MzM3Iiwic291cmNlIjozM30seyJyb290X2Jsb2dpZCI6IjU2NjI0Mzg2Iiwicm9vdF9wb3N0aWQiOjQyMzY1MDI4NDM1LCJwb3N0aWQiOiIxODQyODkwNjgzNjMiLCJibG9naWQiOiIxOTM0MzM3Iiwic291cmNlIjozM30seyJyb290X2Jsb2dpZCI6IjcyMjc2MzI5Iiwicm9vdF9wb3N0aWQiOiIxMzc0ODA5NjA2NjciLCJwb3N0aWQiOiIxODQyODg4MDczNDMiLCJibG9naWQiOiIxOTM0MzM3Iiwic291cmNlIjozM31dfQ==&U=LFCMCFJHJM&K=600602e5cc72d6cd698d22736e67fbcabad4b929587949814298', 'X-Tumblr-Pixel-3': '57d2040fa3bb', 'X-Tumblr-Pixel': '4', 'Link': '\<https://66.media.tumblr.com/avatar_6d0dd0685eab_128.pnj>; rel=icon', 'X-UA-Compatible': 'IE=Edge,chrome=1', 'Content-Encoding': 'gzip'}" h = mystring.split("<xx>",1)[1] h = ast.literal_eval(h.strip()) (而引号中有")使分析变得混乱。你需要逃脱他们

'

为什么您仍要放置此字符串并用'P3p': 'CP=\"Tumblr\\'s privacy policy is available here: https://www.tumblr.com/policy/en/privacy\"', 进行解析?必须有更好的方法来获取您想要的东西。也许存储一个JSON文件,然后astjson.load()存放它?