改善手术刀和标签表现

时间:2018-06-15 20:38:49

标签: haskell web-scraping

我有一个刮刀,可以有选择地从HTML文档中提取链接,但速度很慢(平均每页大约十个链接)。

成本中心显示大部分时间和分配都花费在 <?php header('Access-Control-Allow-Origin: *'); ?> <?php header('Access-Control-Allow-Headers: Origin,X-Requested-With,Content-Type,Accept'); ?> <?php header('Access-Control-Allow-Methods: POST,GET,OPTIONS,PUT'); ?> <?php session_start(); include_once('configdb.php'); error_reporting(E_ALL); $response = array(); if(isset($_POST['email']) && isset($_POST['password'])){ $name= $_POST['email']; $password=$_POST['password']; $result=mysqli_query($conn,"SELECT * FROM users WHERE email='$email' AND password='$password'"); $row=mysqli_fetch_assoc($result); $count=mysqli_num_rows($result); if($count==1){ $_SESSION['users']=array( 'email'=>$row['email'], 'password'=>$row['password'], 'name'=>$row['name'], 'type'=>$row['type'] ); $role=$_SESSION['users']['type']; //Redirecting User Based on Role switch($role){ case 'user': // header('location:user.php'); $response["success"] = 1; $response["message"] = "user."; break; case 'admin': // header('location:admin.php'); $response["success"] = 1; $response["message"] = "admin."; break; } }else{ $response["success"] = 0; $response["message"] = "PASSWORD OR EMAIL DOES NOT EXIST."; } } else{ $response["success"] = 0; $response["message"] = "INVALID REQUEST."; } header('Content-type: application/json'); echo json_encode($response); mysqli_close($conn); ?> 模块的各种功能上。

从用户(我的代码)的角度来看,使用scalpel的这个函数应用谓词来只选择特定的链接:

Text.HTML.TagSoup.Implementation

我的目标是获得> 95%的RTS生产力。该程序的这一部分为30-50%,其他部分约为97%,(对我而言)是与其他语言进行比较的起点。

相关设置:

scrapeHrefs :: (Ord content, Show content, StringLike content) => content -> Selector -> [content]
scrapeHrefs str s = (concat . maybeToList) $ scrapeStringLike str (chroots s (attr "href" anySelector))

基本统计数据:

ghc-options: -rtsopts -fprof-auto -fprof-cafs
command line: +RTS -p -RTS

0 个答案:

没有答案