如何使用python re获取注释内容

时间:2012-10-14 06:44:12

标签: python regex evernote

我通过生成的 evernote a HTML page分享了一条记事。 我想得到头衔和本笔记的内容,所以代码如下:

import re
resource = '<!DOCTYPE html>\n<!--[if lt IE 7 ]> <html class="ie6">; <![endif]--><!--[if IE 7 ]>    <html class="ie7"> <![endif]--><!--[if IE 8 ]>    <html class="ie8"> <![endif]--><!--[if IE 9 ]>    <html class="ie9"> <![endif]--><!--[if gt IE 9]>  <html>             <![endif]--><!--[if !IE]><!--> <html>         <!--<![endif]--><head><meta name="en:locale" content="en" />\n    <meta charset="utf-8" />\n    <meta http-equiv="X-UA-Compatible" content="IE=9,chrome=1" />\n    <meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,minimum-scale=1,user-scalable=0" />\n\n    <meta property="og:title" content="python re"/>\n    <meta property="og:type" content="article"/>\n        <meta property="og:description" content="a question about python re\n "/>\n      <meta property="og:url" content="https://www.evernote.com/shard/s61/sh/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d"/>\n    <meta property="og:image"\n        content="https://www.evernote.com/shard/s61/sh/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d/thm/note/396b4a1f-ae9c-40aa-b740-5aa19e301489"/>\n    <meta property="og:site_name" content="Evernote"/>\n    <meta property="og:created_time" content="1350193749000"/>\n    <meta property="og:updated_time" content="1350193786000"/>\n    <link rel="Shortcut Icon" href="/favicon.ico" type="image/x-icon" />\n\n    <link rel="stylesheet" href="/redesign/global/css/fonts.css" />\n    <link rel="stylesheet" href="/redesign/global/css/header.css" />\n\n    <link rel="stylesheet" href="/redesign/sharing/css/sharedNote.css" />\n    <title>python re</title>\n  <link rel="stylesheet" href="/redesign/modules/SharingMenu/SharingMenu.css"><link rel="stylesheet" href="/redesign/modules/LinkUrlDialog/LinkUrlDialog.css"></head><body class="wrapper"><div class="logo-bar">\n      <a href="http://evernote.com/" target="_blank" class="evernote-logo"></a>\n      <a class="save-button save-button-desktop" href="/saveNote/s61/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d">\n          Save to Evernote</a>\n\n        <div class="switch-account-div">\n          <div class="switch-account-icon"></div>\n          <span class="switch-account-name"></span>\n          <div class="switch-account-arrow"></div>\n          <div class="switch-account-dropdown">\n            <div class="switch-dropdown-arrow"></div>\n            <div class="switch-account-menuitem">\n              Switch Account</div>\n            <div class="switch-account-logout">\n              Sign Out</div>\n          </div>\n        </div>\n\n        </div>\n\n    <div id="message-container">\n      <div id="message">\n        <div id="message-checkmark"></div>\n        <span></span>\n      </div>\n    </div>\n\n    <div id="container-boundingbox" class="wrapper">\n      <div id="container" class="wrapper">\n        <div class="sharing-imagegallery">\n        <div class="SharingMenu"><div class="sharing-menu">\n    <div class="share-button-container">\n      <div class="label-container">\n        <span class="label">\n          Share</span>\n        <div class="label-icon facebook-icon">\n        </div>\n      </div>\n      <div class="icon-container"\n         title="Share">\n        <div class="icon">\n        </div>\n      </div>\n    </div>\n    <div class="menu-bar">\n      <div class="menu-bar-div">\n        <div class="menu-bar-icon facebook-icon"></div>\n        <span class="menu-bar-label">\n          Facebook</span>\n      </div>\n      <div class="menu-bar-div">\n        <div class="menu-bar-icon twitter-icon"></div>\n        <span class="menu-bar-label">\n          Twitter</span>\n      </div>\n      <div class="menu-bar-div">\n        <div class="menu-bar-icon linkedin-icon"></div>\n        <span class="menu-bar-label">\n          LinkedIn</span>\n      </div>\n      <div class="menu-bar-div">\n        <div class="menu-bar-icon link-icon"></div>\n        <span class="menu-bar-label">\n          Link</span>\n      </div>\n    </div>\n  </div>\n</div></div>\n      <div class="shared-by-mobile">\n        Shared by flowerszhong</div>\n      <div class="shared-by shared-by-desktop">\n        <div class="shared-by-left"></div>\n        Shared by flowerszhong<div class="shared-by-right"></div>\n      </div>\n      <h2 class="note-title">python re</h2>\n      <div class="vtop">\n        <div class="note-updated">\n          <span>\n            Updated Today</span>\n        </div>\n      </div>\n      <div class="divider"></div>\n      <div class="note-content">\n        <div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="ennote">\na question about python re\n<div><br/></div></div></div>\n      <a class="save-button save-button-mobile" href="/saveNote/s61/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d">\n        Save to Evernote</a>\n    <div class="clearfix" style="clear: both;"></div>\n</div>\n    </div>\n\n\n    <div class="footer">\n      <div>\n            Evernote makes it easy to remember things big and small from your everyday life using your computer, tablet, phone and the web.</div>\n          <div class="footer-logo"></div>\n        </div>\n\n    <div class="LinkUrlDialog"><script id="linkUrlDialog" type="text/html">\n    <div class="link-url-dialog">\n      <div class="dialog-head">\n        Link to Note</div>\n      <div class="dialog-body">\n        <p>Paste this link into an email or IM to share it.</p>\n        <p>Anyone with the link will be able to view the note.</p>\n      </div>\n      <div class="url-container">\n        <div class="url-title">\n          Note URL:</div>\n        <input type="text" class="url-input" value="{{url}}" readonly>\n        <div class="copy-container">\n          <button type="button" class="copy-button">\n            Copy to Clipboard</button>\n        </div>\n      </div>\n    </div>\n  </script>\n</div><script src="/redesign/global/js/respond.min.js"></script>\n    <script src="/redesign/global/js/require.min.js"></script>\n    <script src="/redesign/global/js/config-require.js"></script>\n    <script type="text/javascript">\n      define("actionBean", [], function() {return {"shareNoteUri":"/shard/s61/sh/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d?shareNote&service=","foodNote":false,"skitchNote":false,"userName":"","switchAccountUri":"/saveNote/s61/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d?switch","logoutUri":"/saveNote/s61/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d?logout","userStatus":"","images":false,"userLoggedIn":false};});\n    </script>\n    <!-- Google Analytics -->\n<script type="text/javascript">\nvar _gaq = _gaq || [];\n_gaq.push([\'_setAccount\', \'UA-285778-5\']);\n\n\n    _gaq.push([\'_trackPageview\', \'/sh/{noteGuid}/{noteKey}/{suffix}\']);\n  \n\n(function() {\n  var ga = document.createElement(\'script\'); ga.type = \'text/javascript\'; ga.async = true;\n  ga.src = (\'https:\' == document.location.protocol ? \'https://ssl\' : \'http://www\') + \'.google-analytics.com/ga.js\';\n  var s = document.getElementsByTagName(\'script\')[0]; s.parentNode.insertBefore(ga, s);\n})();\n</script>\n<!-- End of Google Analytics -->\n<script type="text/javascript">\n      var _gaq = _gaq || [];\n      _gaq.push([\'_setCustomVar\',\n                 4,                                 // Slot 4 - required\n                 \'contentClass\',                    // Category - required\n                 \'\', // Value - required\n                 3                                  // Page-level scope\n                ]);\n\n      _gaq.push([\'_setCustomVar\',\n                 5,                                      // Slot 5 - required\n                 \'sourceApplication\',                    // Category - required\n                 \'\', // Value - required\n                 3                                       // Page-level scope\n                ]);\n      _gaq.push([\'_trackPageview\', \'/singleNote\']);\n    </script>\n  <script type="text/javascript" src="/redesign/modules/SharingMenu/SharingMenu.js"></script><script type="text/javascript" src="/redesign/modules/LinkUrlDialog/LinkUrlDialog.js"></script><script type="text/javascript" src="/redesign/sharing/SharedNoteViewAction/SharedNoteViewAction.js"></script></body></html>'
title_pattern = re.compile('(?<=<title>).+(?=</title>)')
content_pattern = re.compile('(?<=class=\"divider\"></div>).+(?=<a class=\"save-button)')
title= re.search(title_pattern,resource)
content = re.search(content_pattern,resource)

if title:
    print title.group()

if content:
    print content.group()
# if __name__=='__main__':main()

输出:

  

python re

为什么只获得头衔?以及如何获取此笔记的内容?

2 个答案:

答案 0 :(得分:2)

您的问题是内容包含换行符。默认情况下,.与换行符不匹配。

因此,您应该使用re.DOTALL

content_pattern = re.compile('(?<=class=\"divider\"></div>).+(?=<a class=\"save-button)', re.DOTALL)

使.匹配换行符。然后就行了。

答案 1 :(得分:0)

我不完全明白你想做什么,但似乎BeautifulSoup可以帮助你。