我已经进行了一些Web抓取并将结果存储在文本文件中。我忘了只提取文本数据,因此将html作为文本存储在文件中。
我已将文件加载到python中,我想知道是否仍然有我只能提取文本。
'home | thomson reuters\n\n\n\nvar digitaldata={"page":{"attributes":{"businessunit":"thomson reuters corporate","country":"global","language":"en"},"category":{"primarycategory":"thomson reuters corporate"},"pageinfo":{"pagetitle":"home | thomson reuters","pageid":"ec5c71ae 8958 4637 956a b3c3363a1990","pageurl":"https://www.thomsonreuters.com/en.html","pagename":"en:home page:thomson reuters corporate:global:en","pagetemplate":"tr_home page_template","documentage":"355","createdate":"2017 11 08","publishdate":"2018 10 24","pagetype":"home page"}},"product":{}};\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nwindow.cq = window.cq || {}\n\n\n(function(c,h){var f={};c.pubsub=f;var k=c.define;h(f);"function"===typeof k&&k.amd?k(function(){return f}):"object"===typeof exports&&(void 0!==module&&module.exports&&(exports=module.exports=f),exports.pubsub=f,module.exports=exports=f)})("object"===typeof window&&window||this,function(c){function h(a){for(var b in a)if(a.hasownproperty(b))return!0;return!1}function f(a){return function(){throw a;}}'
上面是存储在文本文件中的数据的示例,但是我希望数据看起来只是文本,所以
'home | thomson reuters'
我可以返回并编辑我的Web抓取代码,但是我想知道是否仍然可以挽救这些数据。