Question

就像我有一个像str1 = "IWantToMasterPython"

这样的字符串

如果我想从上面的字符串中提取"Py"。我写道：

extractedString = foo("Master","thon")

我想做所有这些因为我试图从html页面中提取歌词。歌词写成<div class = "lyricbox"> ....lyrics goes here....</div>。

有关如何实施的任何建议。

Answer 1

解决方案是使用正则表达式：

import re
r = re.compile('Master(.*?)thon')
m = r.search(str1)
if m:
    lyrics = m.group(1)

Answer 2

BeautifulSoup是做你想做的最简单的方法。它可以安装如下：

sudo easy_install beautifulsoup

执行您想要的示例代码是：

from BeautifulSoup import BeautifulSoup

doc = ['<div class="lyricbox">Hey You</div>']
soup = BeautifulSoup(''.join(doc))
print soup.find('div', {'class': 'lyricbox'}).string

您可以使用Python的urllib直接从网址抓取内容。如果你想进行更多的解析，Beautiful Soup doc也很有帮助。

Answer 3

def foo(s, leader, trailer):
  end_of_leader = s.index(leader) + len(leader)
  start_of_trailer = s.index(trailer, end_of_leader)
  return s[end_of_leader:start_of_trailer]

如果领导者不在字符串s中，或者在那之后没有预告片（你没有在这样的异常条件中指定你想要的行为），那么这会引发ValueError;提出异常是非常自然和Pythonic要做的事情，让调用者通过try / except处理它，如果它知道在这种情况下该怎么做）。

基于RE的方法也是可行的，但我认为这种纯字符串方法更简单。

Answer 4

如果您从html页面中提取任何数据，我强烈建议您使用BeautifulSoup库。我也用它从html中提取数据，效果很好。

Answer 5

如果您想要在列表中输出所有匹配项，也可以尝试此操作：

import re
str1 = "IWantToMasterPython"

out  = re.compile('Master(.*?)thon', re.DOTALL |  re.IGNORECASE).findall(str1)
if out :
    print out

如何在python中的2个其他字符串之间提取字符串？

5 个答案: