我正在尝试使用strip()删除某些HTML的结尾。我的想法是最终将它构建成一个循环,但是现在我只想弄清楚如何使它工作:
httpKey=("<a href=\"http://www.uaf.edu/academics/degreeprograms/index.html>Degree Programs</a>")
httpKeyEnd=">"
#the number in the httpKey that the httpKey end is at
stripNumber=(httpKey.find(httpKeyEnd))
#This is where I am trying to strip the rest of the information that I do not need.
httpKey.strip(httpKey.find(httpKeyEnd))
print (httpKey)
最终结果是将httpKey打印到屏幕上:
a href =“http://www.uaf.edu/academics/degreeprograms/index.html
答案 0 :(得分:0)
find
将返回找到字符串的索引(数字),strip
从字符串末尾删除字符 ;它没有删除“从那一点开始的一切”。
您想要使用字符串切片:
>>> s = 'hello there: world!'
>>> s.index(':')
11
>>> s[s.index(':')+1:]
' world!'
如果您只是想知道链接是什么,请使用像BeautifulSoup
这样的库:
>>> from bs4 import BeautifulSoup as bs
>>> doc = bs('<a href="http://www.uaf.edu/academics/degreeprograms/index.html">Degree Programs</a>')
>>> for link in doc.find_all('a'):
... print(link.get('href'))
...
http://www.uaf.edu/academics/degreeprograms/index.html
答案 1 :(得分:0)
对于您的情况,这将起作用:
>>> httpKey=("<a href=\"http://www.uaf.edu/academics/degreeprograms/index.html>Degree Programs</a>")
>>> httpKey[1:httpKey.index('>')]
'a href="http://www.uaf.edu/academics/degreeprograms/index.html'