尝试解析JS脚本中的特定值

时间:2018-10-09 07:49:00

标签: python html beautifulsoup python-requests

我正在尝试抓取POST请求中所需的值。在Chrome上使用Inspect Element时,可以多次找到该值,但是由于BS4仅查看源代码,因此我不得不从该站点的JS脚本中抓取该值。

let id_user = parsedResult["id_user"].dictionary // dictionaryValue
self.lbnama.append(id_user["nama_teknisi"].string!)
self.lbtelp.append(id_user["telp_teknisi"].string!)

我设法通过使用来获取整个脚本 <script type ="text/javascript"> var isSRFlow = true; var isPpaOnSignIn =true; var simplifyRegFlowSuccess = false; var retUrl = "https&#x3a;&#x2f;&#x2f;www.ebay.com&#x2f;"; var isFB = false; var isMobile = false; var langCode = "en-US"; var emailAutoCompleteEnabled = true; var dfpContext = '{"enableTMXTagging":"true","slURL":"ebay","flashTagUpgrade":"0","enableFlashTagging":"false","tmxDfpUrl":"https://signin.ebay.com/t_n.html?suppressFlash\u003dtrue\u0026org_id\u003dusllpic0\u0026session_id\u003d57be07a71660ad4e16f42acffffc95e8","swfURL":"ebay","enableSLTagging":"false","swfObjectJSLibURL":"ebay","mid":"AQAAAWZGrHELAAUxNjY1N2JlMDdhNy5hZDRlMTZmLjQyYWNmLmZmZmM5NWU5Jp0dBAKw4k3h8WAm/g97vwVzjcA*","tmxSessionId":"57be07a71660ad4e16f42acffffc95e8","enableHTML5Tagging":"true","flashTagVersion":"1","dfpjsURL":"https://secureir.ebaystatic.com/f/0vk0rkyoky1ltm32dhy0hthnxyx.js"}';

但是,我唯一需要的唯一内容是“ 57be07a71660ad4e16f42acffffc95e8”,这是“ tmxSessionId”之后的数字。如何才能做到这一点?

我也尝试过这些:

r = requests.get('https://reg.ebay.com/reg/PartialReg')
soup = BeautifulSoup(r.text, 'lxml') scripts = soup.find_all('script') your_script = [script for script in scripts if 'tmxSessionId' in str(script)][0]

,以及使用“ find_all”而不是“ find”。我的一位朋友还建议拆分脚本,但我尝试了一下,发现它运行不佳。有什么想法吗?

P.S:我不愿意使用基于浏览器的解决方案,例如硒和PhantomJS,因为我发现它缓慢而无效

编辑: 我使用旧代码从源代码中获取脚本,然后使用塞尔柱克建议的内容

scripts = soup.find_all('script')
your_script = [script for script in scripts if 'tmxSessionId' in str(script)][0]
new = your_script.find("tmxSessionId")
print(new)

1 个答案:

答案 0 :(得分:1)

我不知道您脚本内容的其余部分,因此我不得不关闭标签。但这会起作用。

import requests
from bs4 import BeautifulSoup
import re
import json

script_tag = """
<script type ="text/javascript">        
    var isSRFlow = true;
    var isPpaOnSignIn =true;
    var simplifyRegFlowSuccess = false;
    var retUrl = "https&#x3a;&#x2f;&#x2f;www.ebay.com&#x2f;";
    var isFB = false;
    var isMobile = false;
    var langCode = "en-US";


    var emailAutoCompleteEnabled = true;

    var dfpContext = '{"enableTMXTagging":"true","slURL":"ebay","flashTagUpgrade":"0","enableFlashTagging":"false","tmxDfpUrl":"https://signin.ebay.com/t_n.html?suppressFlash\u003dtrue\u0026org_id\u003dusllpic0\u0026session_id\u003d57be07a71660ad4e16f42acffffc95e8","swfURL":"ebay","enableSLTagging":"false","swfObjectJSLibURL":"ebay","mid":"AQAAAWZGrHELAAUxNjY1N2JlMDdhNy5hZDRlMTZmLjQyYWNmLmZmZmM5NWU5Jp0dBAKw4k3h8WAm/g97vwVzjcA*","tmxSessionId":"57be07a71660ad4e16f42acffffc95e8","enableHTML5Tagging":"true","flashTagVersion":"1","dfpjsURL":"https://secureir.ebaystatic.com/f/0vk0rkyoky1ltm32dhy0hthnxyx.js"}';
</script>
"""

soup = BeautifulSoup(script_tag, 'lxml')
script = soup.find_all('script')[0]
data = re.findall("{.*?}", script.text)[0]

print(json.loads(data)['tmxSessionId'])

输出将为

57be07a71660ad4e16f42acffffc95e8