你好,请帮帮我吗?我正在努力制作一个找到模式的脚本。
我的脚本是:
<meta content="" property="news_keywords"/>
<meta content="Tough and Truthful - Bostonians read the Boston Herald for solid reporting, whether in print or online, on the issues affecting their daily lives. The Boston Herald gets people talking. Our reporters are second-to-none, our photographers are Pulitzer Prize-winning and we present news that Bostonians care about and respond to." property="description"/>
<meta content='{"link":"http:\/\/bostonherald.com\/","type":"frontpage"}' name="parsely-page"/><meta content="" property="keywords"/>
<meta content="Drupal 7 (http://drupal.org)" name="generator"/>
<link href="http://www.bostonherald.com/" rel="canonical"/>
<link href="http://www.bostonherald.com/" rel="shortlink"/>
<meta content="420" http-equiv="refresh"/>
<link href="http://www.bostonherald.com/sites/default/files/images/favicon.ico" rel="shortcut icon" type="image/vnd.microsoft.icon"/>
<title>Boston Herald | Boston Herald</title>
<style media="all" type="text/css">@import url("http://www.bostonherald.com/modules/system/system.base.css?nd76bo");
@import url("http://www.bostonherald.com/modules/system/system.menus.css?nd76bo");
@import url("http://www.bostonherald.com/modules/system/system.messages.css?nd76bo");
@import url("http://www.bostonherald.com/modules/system/system.theme.css?nd76bo");</style>
<style media="all" type="text/css">@import url("http://www.bostonherald.com/modules/aggregator/aggregator.css?nd76bo");
@import url("http://www.bostonherald.com/modules/comment/comment.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/date/date_api/date.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/date/date_popup/themes/datepicker.1.7.css?nd76bo");
@import url("http://www.bostonherald.com/modules/field/theme/field.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/mollom/mollom.css?nd76bo");
@import url("http://www.bostonherald.com/modules/node/node.css?nd76bo");
@import url("http://www.bostonherald.com/modules/poll/poll.css?nd76bo");
@import url("http://www.bostonherald.com/modules/user/user.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/views/css/views.css?nd76bo");</style>
<style media="all" type="text/css">@import url("http://www.bostonherald.com/sites/all/modules/ctools/css/ctools.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/lightbox2/css/lightbox.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/panels/css/panels.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/rate/rate.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/libraries/superfish/css/superfish.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/libraries/superfish/css/superfish-vertical.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/libraries/superfish/css/superfish-navbar.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/views_slideshow/views_slideshow.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/jcarousel/skins/default/jcarousel-default.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/panels/plugins/layouts/twocol_stacked/twocol_stacked.css?nd76bo");</style>
<style media="all" type="text/css">@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/basics.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/custom_blocks.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/navigation.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/view-story_slots.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/taxonomy/taxonomy-styles.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/bhr.css?nd76bo");</style>
<style media="print" type="text/css">@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/print.css?nd76bo");</style>
<style media="all" type="text/css">@import url("http://www.bostonherald.com/sites/all/themes/omega/alpha/css/alpha-reset.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/omega/alpha/css/alpha-alpha.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/omega/omega/css/formalize.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/omega/omega/css/omega-branding.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/omega/omega/css/omega-forms.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/layout-front.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/global.css?nd76bo");</style>
<style media="all" type="text/css">@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/ike-omega-alpha-default.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/ike-omega-alpha-default-normal.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/omega/alpha/css/grid/alpha_default/normal/alpha-default-normal-24.css?nd76bo");</style>
模式是
<meta content(+.?)refresh">
字符串是如此之大,所以我尝试了不同的方法,但都无法正常工作。我不想将字符串保存在任何txt文件中。
我试过的剧本,但他们没有工作。
#Try 1
import re
re.findall("<meta content(+.?)refresh">",html)
#Try 2
matching = [s for s in html if "<meta content(+.?)refresh">" in s]
答案 0 :(得分:0)
评论中的问题是:'我想抓住以“元内容”开头的字符串部分,并以“刷新”&gt;“完成。”
我把它分成几行,因为那样^匹配每一行的开头,而不是整个字符串。我用^来匹配开头,$用来匹配结束。事实上,这些可能不是必需的,因为&lt;和&gt;就足够了。另请注意,双引号会被它前面的斜杠字符转义。
另一个关键点:它不是+。?但是。*?这将有助于抓住字符串中间的所有字符。
>>> import re
>>> for line in html.splitlines():
... m = re.match("^<meta content(.*?)refresh\"/>$", line)
... if m:
... print(m.group(0))
...
<meta content="420" http-equiv="refresh"/>
可以在此处找到有关Python正则表达式的文档:https://docs.python.org/2/library/re.html