I am wanting to make a simple python script to automate the process of pulling .mov files from an IP camera's SD card. The Model of IP camera supports http requests which returns HTML that contains the .mov file info. My python script so far..
from bs4 import BeautifulSoup
import requests
page = requests.get("http://192.168.1.99/form/getStorageFileList?type=3")
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())
OUTPUT:
NAME2041=Record_continiously/2018-06-02/8/MP_2018-06-03_00-33-15_60.mov
I want to only return the MOV file. So removing:
"NAME2041=Record_continiously/2018-06-02/8/"
I'm new to HTML parsing with python so I'm a bit confused with the functionality.
Is returned HTML considered a string? If so, I understand that it will be immutable and I will have to create a new string instead of "striping away" the preexisting string.
I have tried:
page.replace("NAME2041=Record_continiously/2018-06-02/8/","")
in which I receive an attribute error. Is anyone aware of any method that could accomplish this?
Here is a sample of the HTML I am working with...
<html>
<head></head>
<body>
000 Success NUM=2039 NAME0=Record_Continuously/2018-06-04/10/MP_2018-06-04_12-17-38_60.mov SIZE0=15736218
NAME1=Record_Continuously/2018-06-04/10/MP_2018-06-04_12-16-37_60.mov SIZE1=15683077
NAME2=Record_Continuously/2018-06-04/10/MP_2018-06-04_12-15-36_60.mov SIZE2=15676882
NAME3=Record_Continuously/2018-06-04/10/MP_2018-06-04_12-14-35_60.mov SIZE3=15731539
</body>
</html>
答案 0 :(得分:0)
Use str.split
with negative indexing.
Ex:
page = "NAME2041=Record_continiously/2018-06-02/8/MP_2018-06-03_00-33-15_60.mov"
print( page.split("/")[-1])
Output:
MP_2018-06-03_00-33-15_60.mov
答案 1 :(得分:0)
as you asked for explanation of your code here it is:
# import statements
from bs4 import BeautifulSoup
import requests
page = requests.get("http://192.168.1.99/form/getStorageFileList?type=3") # returns response object
soup = BeautifulSoup(page.content, 'html.parser') #
page.content
returns string content of response
you are passing this(page.content
) string content to class BeautifulSoup
which is initialized with two arguments your content(page.content
) as string and parser here it is html.parser
soup
is the object of BeautifulSoup
.prettify()
is method used to pretty print the content
In string slicing you may get failure of result due to length of content so it's better to split your content as suggested by @Rakesh and that's the best approach in your case.