Question

我是python编程的新手。我正在学习废话网站的美丽。

我想提取并将“stream”的值存储到我的变量中。

我的Python代码如下：

import bs4 as bs #Importing BeautifulSoup4 Python Library.
import urllib.request
import requests
import json
import re

headers = {'User-Agent':'Mozilla/5.0'}
url = "http://thoptv.com/partners/mhdTVlive/Core.php?level=1200&channel=Dsports_HD"

page = requests.get(url)
soup = bs.BeautifulSoup(page.text,"html.parser")
pattern = re.compile('var stream = (.*?);')
scripts = soup.find_all('script')

for script in scripts:
   if(pattern.match(str(script.string))):
       data = pattern.match(script.string)
       links = json.loads(data.groups()[0])
       print(links)

这是获取流网址值的源JavaScript代码。

https://content.jwplatform.com/libraries/oncyToRO.js'>if(navigator.userAgent.match（/ android / i）|| navigator.userAgent.match（/ webOS的/ I）|| navigator.userAgent.match（/ iPhone / I）|| navigator.userAgent.match（/ ipad公司/ I）|| navigator.userAgent.match（/ iPod的/ I）|| navigator.userAgent.match（/黑莓/ I）|| navigator.userAgent.match（/ Windows Phone / i））{var stream = “http://ssrigcdnems01.cdnsrv.jio.com/jiotv.live.cdn.jio.com/Dsports_HD/Dsports_HD_800.m3u8?jct=ibxIPxc6rkq1yIUJb4RlEV&pxe=1504146411&st=AQIC5wM2LY4SfczRaEwgGl4Dyvly_3HihdlD_Oduojk5Kxs AAJTSQACMDIAAlNLABQtNjUxNDEwODczODgxNzkyMzg5OQACUzEAAjYw 。”;}否则{风险 stream = “http://hd.simiptv.com:8080//index.m3u8?key=VIoVSsGRLRouHWGNo1epzX&exp=932213423&domain=thoptv.stream&id=461”;} jwplayer（ “THOPTVPlayer”）设置（{ “标题”。 'thoptv.stream'，“拉伸”：“exactfit”，“width”：“100％”，“file”： none，“height”：“100％”，“skin”：“seven”，“autostart”：“true”，“logo”： { “文件”： “https://i.imgur.com/EprI2uu.png”， “保证金”： “ - 0”， “位置”： “左上”， “隐藏”： “假”， “链接”： “http://mhdtvlive.co.in”}， “androidhls”：真，}）; jwplayer（ “THOPTVPlayer”）的onError（函数（。）{jwplayer（）负载（{文件： “http://content.jwplatform.com/videos/7RtXk3vl-52qL9xLP.mp4”，图像： “http://content.jwplatform.com/thumbs/7RtXk3vl-480.jpg”。}）; jwplayer（） .play（）;}）; jwplayer（ “THOPTVPlayer”）的onComplete（函数（）{window.location的 = window.location.href;}）; jwplayer（“THOPTVPlayer”）。onPlay（function（）{clearTimeout（theTimeout）;}）;

我需要从流中提取网址。

var stream =“http://ssrigcdnems01.cdnsrv.jio.com/jiotv.live.cdn.jio.com/Dsports_HD/Dsports_HD_800.m3u8?jct=ibxIPxc6rkq1yIUJb4RlEV&pxe=1504146411&st=AQIC5wM2LY4SfczRaEwgGl4Dyvly_3HihdlD_Oduojk5Kxs。 AAJTSQACMDIAAlNLABQtNjUxNDEwODczODgxNzkyMzg5OQACUzEAAjYw ”;}

Answer 1

此代码适合我

import bs4 as bs #Importing BeautifulSoup4 Python Library.
import urllib.request
import requests
import json


headers = {'User-Agent':'Mozilla/5.0'}
url = "http://thoptv.com/partners/mhdTVlive/Core.php?
level=1200&channel=Dsports_HD"

page = requests.get(url)
soup = bs.BeautifulSoup(page.text,"html.parser")

scripts = soup.find_all('script')



out = list()
for c, i in enumerate(scripts): #go over list
    text = i.text
    if(text[:2] == "if"): #if the (if) comes first 
        for count, t in enumerate(text): # then we have reached the correct item in the list
            if text[count] == "{" and text[count + 1] == "v" and text[count + 5] == "s": # and if this is here that stream is set
                tmp = text[count:] # add this to the tmp varible
                break # and end
co = 0
for m in tmp: #loop over the results from prev. result
    if m == "\"" and co == 0: #if string is starting
        co = 1 #set count to "true" 1
    elif m == "\"" and co == 1: # if it is ending stop
        print(''.join(out)) #results
        break
    elif co == 1:
        # as long as we are looping over the rigth string
        out.append(m) #add to out list
        pass

result = ''.join(out) #set result

它基本上可以过滤掉字符串。

但是如果我们使用user1767754方法（顺便说一句，那么）我们会得到这样的结果：

import bs4 as bs #Importing BeautifulSoup4 Python Library.
import urllib.request
import requests
import json

headers = {'User-Agent':'Mozilla/5.0'}
url = "http://thoptv.com/partners/mhdTVlive/Core.php?level=1200&channel=Dsports_HD"

page = requests.get(url)
soup = bs.BeautifulSoup(page.text,"html.parser")

scripts = soup.find_all('script')

x = scripts[3].text

left1, right1 = x.split("Phone/i)) {var stream =")
left2, right2 = right1.split(";}else")

print(left2)

使用BeautifulSoup4 Python

1 个答案: