如何从PHP中删除字符串中的JavaScript残余

时间:2013-12-15 03:11:50

标签: php string preg-replace

我正在运行以下步骤以尝试清理使用$ query_text_lower = file_get_contents(websiteURL)获取的字符串。

我想要的是回报单词。没有javascript,没有随机数,没有CSS或任何其他类型的脚本。

//remove javascript
$query_text_lower = preg_replace("/<script[^>]*>.*?< *script[^>]*>/i", "", $new_text); 

//remove html tags
$query_text_lower2 = strip_tags($query_text_lower);

//removes any text containing links (may not be best, as some sites link useful words within the text. Does tend to remove a lot of ads though
$query_text_lower3 = preg_replace('/<a\s.*?>.*?<\/a>/s', '', $query_text_lower2);

//removes linebreaks
$query_text_lower4 = trim(preg_replace('/\s+/', ' ', $query_text_lower3));

echo $query_text_lower4;
die();

以下是我目前输出内容的示例:

developing a cafe: 13 steps - wikihow /**/ /**/ messages log in log in via log in remember me forgot? create an account explore community dashboardrandom articleabout uscategoriesrecent changes help us communication an articlerequest a new articleanswer a requestmore ideas... edit edit this article home » categories » investment and business » business » buying & forming a business » hospitality businesses articleeditdiscuss wh.mergelang({ 'navlist_collapse': '- collapse','navlist_expand': '+ expand','usernameoremail': 'username or email','password': 'password' }); edit articlehow to start a cafe edited by harri, maluniu, annie, afc8871 and 1 other if you have always dreamt of management a business, then learning how to start a cafe may be the answer. with the right planning beforehand, opening a cafe can become highly profitable. your cafe can easily become a place where staff come to relax, enjoy schedule with friends or family, grab a quick bite to eat, or to work on their latest project. start a cafe business by following the steps below. ad google_ad_customer = "ca-pub-9543332082073187"; /* iframe unit - intro */ if(abtype == 2) google_ad_slot = '6354743772'; else //a or normal google_ad_slot = '8579663774'; if(abtype == 2 || abtype == 3 || abtype == 4 || abtype == 5 || abtype == 6) { google_ad_width = 671; google_ad_height = 120; google_max_num_ads = 2; } else if(abtype >= 7) { google_ad_width = 645; google_ad_height = 60; google_max_num_ads = 1; } else { google_ad_width = 671; google_ad_height = 60; google_max_num_ads = 1; } google_ad_results = 'html'; google_override_format = true; google_ad_channel = "0206790666+7733764704+1640266093+6709519645+8052511407+6822404019+7122150828" + gchans + xchannels; if( fromsearch ) { document.communication(''); } //--> edit steps 1communication your business and marketing plans. these are very important aspects of any business, as they will show your course of action for both management and marketing the business. refer to these documents often to make sure you stay on track. without these documents, you may not be able to secure funding. ad google_ad_customer = "ca-pub-9543332082073187"; /* iframe unit - first step */ if(abtype == 2) google_ad_slot = '4878010577'; else //a or normal google_ad_slot = '5205564977'; if(abtype == 2 || abtype == 3 || abtype == 4 || abtype == 5 || abtype == 6) { google_ad_width = 629; google_ad_height = 120; google_max_num_ads = 2; } else if(abtype >= 7) { google_ad_width = 600; google_ad_height = 60; google_max_num_ads = 1; } else { google_ad_width = 629; google_ad_height = 60; google_max_num_ads = 1; } google_ad_results = 'html'; google_override_format = true; google_ad_channel = "2748203808+7733764704+1640266093+6709519645+8052511407+2490795108+6822404019+7122150828" + gchans + xchannels; document.communication(''); //--> 2follow all legalities for starting a cafe business of this nature in your area. make sure you get all the necessary licenses, permits, and insurance required on federal, state, and local levels. 3secure funding for your business. in your business plan, you determined how much funding you need to start a cafe business. contact investors, apply for loans, and use whatever capital you have on hand to start the business. 

2 个答案:

答案 0 :(得分:1)

您的javascript正则表达式已关闭

你有:

$query_text_lower = preg_replace("/<script[^>]*>.*?< *script[^>]*>/i", "", $new_text); 

你没有检测到&lt; / script&gt;在返回的文档中,因此它不会从页面中删除javascript代码本身,但是当您调用striptags时,您将删除标记,因此它们不会出现在您的最终输出中。但是,我看不到你从中提取这个网站所以我不能百分之百地使用它。

如果这有意义,请告诉我。基本上,它看起来像我的第一个正则表达式实际上并不匹配任何东西。

答案 1 :(得分:1)

你无法使用正则表达式解析那些东西。使用现有DOM工具解析HTML的建议是正确的方法。