我正在尝试抓取POST请求中所需的值。在Chrome上使用Inspect Element时,可以多次找到该值,但是由于BS4仅查看源代码,因此我不得不从该站点的JS脚本中抓取该值。
let id_user = parsedResult["id_user"].dictionary // dictionaryValue
self.lbnama.append(id_user["nama_teknisi"].string!)
self.lbtelp.append(id_user["telp_teknisi"].string!)
我设法通过使用来获取整个脚本
<script type ="text/javascript">
var isSRFlow = true;
var isPpaOnSignIn =true;
var simplifyRegFlowSuccess = false;
var retUrl = "https://www.ebay.com/";
var isFB = false;
var isMobile = false;
var langCode = "en-US";
var emailAutoCompleteEnabled = true;
var dfpContext = '{"enableTMXTagging":"true","slURL":"ebay","flashTagUpgrade":"0","enableFlashTagging":"false","tmxDfpUrl":"https://signin.ebay.com/t_n.html?suppressFlash\u003dtrue\u0026org_id\u003dusllpic0\u0026session_id\u003d57be07a71660ad4e16f42acffffc95e8","swfURL":"ebay","enableSLTagging":"false","swfObjectJSLibURL":"ebay","mid":"AQAAAWZGrHELAAUxNjY1N2JlMDdhNy5hZDRlMTZmLjQyYWNmLmZmZmM5NWU5Jp0dBAKw4k3h8WAm/g97vwVzjcA*","tmxSessionId":"57be07a71660ad4e16f42acffffc95e8","enableHTML5Tagging":"true","flashTagVersion":"1","dfpjsURL":"https://secureir.ebaystatic.com/f/0vk0rkyoky1ltm32dhy0hthnxyx.js"}';
但是,我唯一需要的唯一内容是“ 57be07a71660ad4e16f42acffffc95e8”,这是“ tmxSessionId”之后的数字。如何才能做到这一点?
我也尝试过这些:
r = requests.get('https://reg.ebay.com/reg/PartialReg')
soup = BeautifulSoup(r.text, 'lxml')
scripts = soup.find_all('script')
your_script = [script for script in scripts if 'tmxSessionId' in str(script)][0]
,以及使用“ find_all”而不是“ find”。我的一位朋友还建议拆分脚本,但我尝试了一下,发现它运行不佳。有什么想法吗?
P.S:我不愿意使用基于浏览器的解决方案,例如硒和PhantomJS,因为我发现它缓慢而无效
编辑: 我使用旧代码从源代码中获取脚本,然后使用塞尔柱克建议的内容
scripts = soup.find_all('script')
your_script = [script for script in scripts if 'tmxSessionId' in str(script)][0]
new = your_script.find("tmxSessionId")
print(new)
答案 0 :(得分:1)
我不知道您脚本内容的其余部分,因此我不得不关闭标签。但这会起作用。
import requests
from bs4 import BeautifulSoup
import re
import json
script_tag = """
<script type ="text/javascript">
var isSRFlow = true;
var isPpaOnSignIn =true;
var simplifyRegFlowSuccess = false;
var retUrl = "https://www.ebay.com/";
var isFB = false;
var isMobile = false;
var langCode = "en-US";
var emailAutoCompleteEnabled = true;
var dfpContext = '{"enableTMXTagging":"true","slURL":"ebay","flashTagUpgrade":"0","enableFlashTagging":"false","tmxDfpUrl":"https://signin.ebay.com/t_n.html?suppressFlash\u003dtrue\u0026org_id\u003dusllpic0\u0026session_id\u003d57be07a71660ad4e16f42acffffc95e8","swfURL":"ebay","enableSLTagging":"false","swfObjectJSLibURL":"ebay","mid":"AQAAAWZGrHELAAUxNjY1N2JlMDdhNy5hZDRlMTZmLjQyYWNmLmZmZmM5NWU5Jp0dBAKw4k3h8WAm/g97vwVzjcA*","tmxSessionId":"57be07a71660ad4e16f42acffffc95e8","enableHTML5Tagging":"true","flashTagVersion":"1","dfpjsURL":"https://secureir.ebaystatic.com/f/0vk0rkyoky1ltm32dhy0hthnxyx.js"}';
</script>
"""
soup = BeautifulSoup(script_tag, 'lxml')
script = soup.find_all('script')[0]
data = re.findall("{.*?}", script.text)[0]
print(json.loads(data)['tmxSessionId'])
输出将为
57be07a71660ad4e16f42acffffc95e8