我通过生成的 evernote 和a HTML page分享了一条记事。 我想得到头衔和本笔记的内容,所以代码如下:
import re
resource = '<!DOCTYPE html>\n<!--[if lt IE 7 ]> <html class="ie6">; <![endif]--><!--[if IE 7 ]> <html class="ie7"> <![endif]--><!--[if IE 8 ]> <html class="ie8"> <![endif]--><!--[if IE 9 ]> <html class="ie9"> <![endif]--><!--[if gt IE 9]> <html> <![endif]--><!--[if !IE]><!--> <html> <!--<![endif]--><head><meta name="en:locale" content="en" />\n <meta charset="utf-8" />\n <meta http-equiv="X-UA-Compatible" content="IE=9,chrome=1" />\n <meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,minimum-scale=1,user-scalable=0" />\n\n <meta property="og:title" content="python re"/>\n <meta property="og:type" content="article"/>\n <meta property="og:description" content="a question about python re\n "/>\n <meta property="og:url" content="https://www.evernote.com/shard/s61/sh/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d"/>\n <meta property="og:image"\n content="https://www.evernote.com/shard/s61/sh/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d/thm/note/396b4a1f-ae9c-40aa-b740-5aa19e301489"/>\n <meta property="og:site_name" content="Evernote"/>\n <meta property="og:created_time" content="1350193749000"/>\n <meta property="og:updated_time" content="1350193786000"/>\n <link rel="Shortcut Icon" href="/favicon.ico" type="image/x-icon" />\n\n <link rel="stylesheet" href="/redesign/global/css/fonts.css" />\n <link rel="stylesheet" href="/redesign/global/css/header.css" />\n\n <link rel="stylesheet" href="/redesign/sharing/css/sharedNote.css" />\n <title>python re</title>\n <link rel="stylesheet" href="/redesign/modules/SharingMenu/SharingMenu.css"><link rel="stylesheet" href="/redesign/modules/LinkUrlDialog/LinkUrlDialog.css"></head><body class="wrapper"><div class="logo-bar">\n <a href="http://evernote.com/" target="_blank" class="evernote-logo"></a>\n <a class="save-button save-button-desktop" href="/saveNote/s61/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d">\n Save to Evernote</a>\n\n <div class="switch-account-div">\n <div class="switch-account-icon"></div>\n <span class="switch-account-name"></span>\n <div class="switch-account-arrow"></div>\n <div class="switch-account-dropdown">\n <div class="switch-dropdown-arrow"></div>\n <div class="switch-account-menuitem">\n Switch Account</div>\n <div class="switch-account-logout">\n Sign Out</div>\n </div>\n </div>\n\n </div>\n\n <div id="message-container">\n <div id="message">\n <div id="message-checkmark"></div>\n <span></span>\n </div>\n </div>\n\n <div id="container-boundingbox" class="wrapper">\n <div id="container" class="wrapper">\n <div class="sharing-imagegallery">\n <div class="SharingMenu"><div class="sharing-menu">\n <div class="share-button-container">\n <div class="label-container">\n <span class="label">\n Share</span>\n <div class="label-icon facebook-icon">\n </div>\n </div>\n <div class="icon-container"\n title="Share">\n <div class="icon">\n </div>\n </div>\n </div>\n <div class="menu-bar">\n <div class="menu-bar-div">\n <div class="menu-bar-icon facebook-icon"></div>\n <span class="menu-bar-label">\n Facebook</span>\n </div>\n <div class="menu-bar-div">\n <div class="menu-bar-icon twitter-icon"></div>\n <span class="menu-bar-label">\n Twitter</span>\n </div>\n <div class="menu-bar-div">\n <div class="menu-bar-icon linkedin-icon"></div>\n <span class="menu-bar-label">\n LinkedIn</span>\n </div>\n <div class="menu-bar-div">\n <div class="menu-bar-icon link-icon"></div>\n <span class="menu-bar-label">\n Link</span>\n </div>\n </div>\n </div>\n</div></div>\n <div class="shared-by-mobile">\n Shared by flowerszhong</div>\n <div class="shared-by shared-by-desktop">\n <div class="shared-by-left"></div>\n Shared by flowerszhong<div class="shared-by-right"></div>\n </div>\n <h2 class="note-title">python re</h2>\n <div class="vtop">\n <div class="note-updated">\n <span>\n Updated Today</span>\n </div>\n </div>\n <div class="divider"></div>\n <div class="note-content">\n <div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="ennote">\na question about python re\n<div><br/></div></div></div>\n <a class="save-button save-button-mobile" href="/saveNote/s61/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d">\n Save to Evernote</a>\n <div class="clearfix" style="clear: both;"></div>\n</div>\n </div>\n\n\n <div class="footer">\n <div>\n Evernote makes it easy to remember things big and small from your everyday life using your computer, tablet, phone and the web.</div>\n <div class="footer-logo"></div>\n </div>\n\n <div class="LinkUrlDialog"><script id="linkUrlDialog" type="text/html">\n <div class="link-url-dialog">\n <div class="dialog-head">\n Link to Note</div>\n <div class="dialog-body">\n <p>Paste this link into an email or IM to share it.</p>\n <p>Anyone with the link will be able to view the note.</p>\n </div>\n <div class="url-container">\n <div class="url-title">\n Note URL:</div>\n <input type="text" class="url-input" value="{{url}}" readonly>\n <div class="copy-container">\n <button type="button" class="copy-button">\n Copy to Clipboard</button>\n </div>\n </div>\n </div>\n </script>\n</div><script src="/redesign/global/js/respond.min.js"></script>\n <script src="/redesign/global/js/require.min.js"></script>\n <script src="/redesign/global/js/config-require.js"></script>\n <script type="text/javascript">\n define("actionBean", [], function() {return {"shareNoteUri":"/shard/s61/sh/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d?shareNote&service=","foodNote":false,"skitchNote":false,"userName":"","switchAccountUri":"/saveNote/s61/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d?switch","logoutUri":"/saveNote/s61/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d?logout","userStatus":"","images":false,"userLoggedIn":false};});\n </script>\n <!-- Google Analytics -->\n<script type="text/javascript">\nvar _gaq = _gaq || [];\n_gaq.push([\'_setAccount\', \'UA-285778-5\']);\n\n\n _gaq.push([\'_trackPageview\', \'/sh/{noteGuid}/{noteKey}/{suffix}\']);\n \n\n(function() {\n var ga = document.createElement(\'script\'); ga.type = \'text/javascript\'; ga.async = true;\n ga.src = (\'https:\' == document.location.protocol ? \'https://ssl\' : \'http://www\') + \'.google-analytics.com/ga.js\';\n var s = document.getElementsByTagName(\'script\')[0]; s.parentNode.insertBefore(ga, s);\n})();\n</script>\n<!-- End of Google Analytics -->\n<script type="text/javascript">\n var _gaq = _gaq || [];\n _gaq.push([\'_setCustomVar\',\n 4, // Slot 4 - required\n \'contentClass\', // Category - required\n \'\', // Value - required\n 3 // Page-level scope\n ]);\n\n _gaq.push([\'_setCustomVar\',\n 5, // Slot 5 - required\n \'sourceApplication\', // Category - required\n \'\', // Value - required\n 3 // Page-level scope\n ]);\n _gaq.push([\'_trackPageview\', \'/singleNote\']);\n </script>\n <script type="text/javascript" src="/redesign/modules/SharingMenu/SharingMenu.js"></script><script type="text/javascript" src="/redesign/modules/LinkUrlDialog/LinkUrlDialog.js"></script><script type="text/javascript" src="/redesign/sharing/SharedNoteViewAction/SharedNoteViewAction.js"></script></body></html>'
title_pattern = re.compile('(?<=<title>).+(?=</title>)')
content_pattern = re.compile('(?<=class=\"divider\"></div>).+(?=<a class=\"save-button)')
title= re.search(title_pattern,resource)
content = re.search(content_pattern,resource)
if title:
print title.group()
if content:
print content.group()
# if __name__=='__main__':main()
输出:
python re
为什么只获得头衔?以及如何获取此笔记的内容?
答案 0 :(得分:2)
您的问题是内容包含换行符。默认情况下,.
与换行符不匹配。
因此,您应该使用re.DOTALL
:
content_pattern = re.compile('(?<=class=\"divider\"></div>).+(?=<a class=\"save-button)', re.DOTALL)
使.
匹配换行符。然后就行了。
答案 1 :(得分:0)
我不完全明白你想做什么,但似乎BeautifulSoup
可以帮助你。