我无法在抓取时移动页面

时间:2017-02-21 03:19:40

标签: javascript python python-3.x beautifulsoup web-crawler

http://www.kif.re.kr/kif2/publication/pub_list.aspx?menuid=17

我正在制作一个爬虫。但我不能进入下一页。我想转到下一页。

<a class="pagebutton" href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$data_list1$WebPageNavigatorV21$ctl12','')">2</a>

这是第二页的html代码。

作为在开发者模式中搜索的结果,它是post方法。

Request URL:http://www.kif.re.kr/kif2/publication/pub_list.aspx?menuid=17

以下是在开发者模式中找到的数据。

__EVENTTARGET:ctl00$ContentPlaceHolder1$data_list1$WebPageNavigatorV21$ctl12
__EVENTARGUMENT:
__VIEWSTATE:/wEPDwUKMTg4Nzc2Nzc3OA9kFgJmD2QWAgIED2QWAgIDD2QWAmYPZBYEZg9kFgQCAQ8QZBAVBwbsoJzrqqkG7KCA7J6QDOuwnOqwhOyXsOyblAbqtoztmLgJ7J2Y66Kw7LKYBuuqqeywqAbsmpTslb0VBwgyNDAvMjQxLwgyNDAvMjQyLwgyNDAvMjU5LwgyNDAvMjYyLwgyNDAvMzM3LwgyNDAvMjYzLwgyNDAvMjY2LxQrAwdnZ2dnZ2dnZGQCAw8PZBYCHgpvbmtleXByZXNzBV5pZiAoZXZlbnQua2V5Q29kZSA9PSAxMykge19fZG9Qb3N0QmFjaygnY3RsMDAkQ29udGVudFBsYWNlSG9sZGVyMSRkYXRhX2xpc3QxJGlidFNlYXJjaCcsJycpfTsgZAIDDw9kFgIeBWFsaWduBQZjZW50ZXIWAgIDD2QWAmYPZBYCZg9kFhYCBA8PFggeBFRleHQFATEeCENzc0NsYXNzBQdjdXJyZW50HgRfIVNCAgIeB1Zpc2libGVnZGQCBg8PFggfAgUBMh8DBQpwYWdlYnV0dG9uHwQCAh8FZ2RkAggPDxYIHwIFATMfAwUKcGFnZWJ1dHRvbh8EAgIfBWdkZAIKDw8WCB8CBQE0HwMFCnBhZ2VidXR0b24fBAICHwVnZGQCDA8PFggfAgUBNR8DBQpwYWdlYnV0dG9uHwQCAh8FZ2RkAg4PDxYIHwIFATYfAwUKcGFnZWJ1dHRvbh8EAgIfBWdkZAIQDw8WCB8CBQE3HwMFCnBhZ2VidXR0b24fBAICHwVnZGQCEg8PFggfAgUBOB8DBQpwYWdlYnV0dG9uHwQCAh8FZ2RkAhQPDxYIHwIFATkfAwUKcGFnZWJ1dHRvbh8EAgIfBWdkZAIWDw8WCB8CBQIxMB8DBQpwYWdlYnV0dG9uHwQCAh8FZ2RkAhsPDxYCHwIFCyZuYnNwWzEvODNdZGQYAQUeX19Db250cm9sc1JlcXVpcmVQb3N0QmFja0tleV9fFgcFGWN0bDAwJG1lbnVfbmF2MSRpYnRTZWFyY2gFLmN0bDAwJENvbnRlbnRQbGFjZUhvbGRlcjEkZGF0YV9saXN0MSRpYnRTZWFyY2gFMWN0bDAwJENvbnRlbnRQbGFjZUhvbGRlcjEkZGF0YV9saXN0MSRpYnRTZWFyY2hBbGwFPmN0bDAwJENvbnRlbnRQbGFjZUhvbGRlcjEkZGF0YV9saXN0MSRXZWJQYWdlTmF2aWdhdG9yVjIxJGN0bDA2BT5jdGwwMCRDb250ZW50UGxhY2VIb2xkZXIxJGRhdGFfbGlzdDEkV2ViUGFnZU5hdmlnYXRvclYyMSRjdGwwOAU+Y3RsMDAkQ29udGVudFBsYWNlSG9sZGVyMSRkYXRhX2xpc3QxJFdlYlBhZ2VOYXZpZ2F0b3JWMjEkY3RsMzAFPmN0bDAwJENvbnRlbnRQbGFjZUhvbGRlcjEkZGF0YV9saXN0MSRXZWJQYWdlTmF2aWdhdG9yVjIxJGN0bDMyuFjgj5nepdWXkOAwNYww+divJYtYSrYgHZpTcewu9Ds=
__VIEWSTATEGENERATOR:E95FE49A
__EVENTVALIDATION:/wEdACHcOKX2MiW8o3JKug67fnRBm/LuJNf32p7npb2HQkdSHj2jQIPNrpQqFhY2rmhcQzOr90YGqna/Dtr3eCnJKH/FRrctoJJXOcc5nzwqquFEKe/f6ybfmfBBwP5V9TZX05svUiuWBMoi40eiFXgXu/HvnPjbm91I+Oz3HACj/rejcfKu91e/rwNa3qahKk8QP//P3Ctl3lcnXTxti+MHToVFJ4X5e7akN9M5YNbryOCPFUzWTSqkhEUajNOJze2BA47TqM8vDP0IP5ki4KWYQixH1ITUrNZx490LfBrUZBBPZp6DDFbb0FBaxN5KpyeciB3wOyFRvNC7wvyrzR4zZIFKvsDwEoIoZw4QpAfkYvtGlm/erM6tYMUIO2Y+EofXRtI5fpcvmMZwp9oWz1DjjMQ7kMX3NKB1EbRuWhW/PUV26RCgECz38VETCqQlHmY2JJfazoydmTWb206Gy1R0dPzbnPz5BKeIBWlSOZDH/jTFFrzBKTtWpKGoPFsObJHPJ/aat3bwhGesAEcXWRHlLMcB7+Yj6K/9RPZv/XJ9M8z/IAbi3aAtkyVcWc7DpsPsia8+XWZOcmYS4tf4O30N13XKSyM1xB3zywxlTxuxx1lP5+GDugiF+Yf+KojuR7Az4t0LDho3RsEd/ZN7ejUxBtxfh6oqlZNMy4/Raz+OSUeRTRVfoUMGNPEUTwp88pek/ycTkyMA26w5UfW8JGdFRvrmOA59JlLF9OIGGWESn/RCnw==
ctl00$agentPlatform:1
ctl00$menu_nav1$tbxSearchWord:
ctl00$ContentPlaceHolder1$data_list1$ddlSearchItem:240/241/
ctl00$ContentPlaceHolder1$data_list1$tbxSearch:
ctl00$ContentPlaceHolder1$data_list1$hdnSearchText:
ctl00$ContentPlaceHolder1$data_list1$hdnSearchPath:240/241/
ctl00$ContentPlaceHolder1$data_list1$WebPageNavigatorV21$ctl00:0
ctl00$ContentPlaceHolder1$data_list1$WebPageNavigatorV21$ctl01:1
ctl00$ContentPlaceHolder1$data_list1$WebPageNavigatorV21$ctl02:821

以下是我的代码。代码本身没有错误。 但是,r.text的值不是我想要的。

       url = 'http://www.kif.re.kr/kif2/publication/pub_list.aspx?menuid=17'
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text, 'lxml')

        pageTag = soup.findAll('td',align='center')

        inputTag = pageTag[0].findAll('a')

        for link in inputTag:
            print(link['href'])
            payload = {'__EVENTTARGET ' :'ctl00$ContentPlaceHolder1$data_list1$WebPageNavigatorV21$ctl12',
                       '__EVENTARGUMENT' : '',
                       '__VIEWSTATE'
                       : '/wEPDwUKMTg4Nzc2Nzc3OA9kFgJmD2QWAgIED2QWAgIDD2QWAmYPZBYEZg9kFgQCAQ8QZBAVBwbsoJzrqqkG7KCA7J6QDOuwnOqwhOyXsOyblAbqtoztmLgJ7J2Y66Kw7LKYBuuqqeywqAbsmpTslb0VBwgyNDAvMjQxLwgyNDAvMjQyLwgyNDAvMjU5LwgyNDAvMjYyLwgyNDAvMzM3LwgyNDAvMjYzLwgyNDAvMjY2LxQrAwdnZ2dnZ2dnZGQCAw8PZBYCHgpvbmtleXByZXNzBV5pZiAoZXZlbnQua2V5Q29kZSA9PSAxMykge19fZG9Qb3N0QmFjaygnY3RsMDAkQ29udGVudFBsYWNlSG9sZGVyMSRkYXRhX2xpc3QxJGlidFNlYXJjaCcsJycpfTsgZAIDDw9kFgIeBWFsaWduBQZjZW50ZXIWAgIDD2QWAmYPZBYCZg9kFhYCBA8PFggeBFRleHQFATEeCENzc0NsYXNzBQdjdXJyZW50HgRfIVNCAgIeB1Zpc2libGVnZGQCBg8PFggfAgUBMh8DBQpwYWdlYnV0dG9uHwQCAh8FZ2RkAggPDxYIHwIFATMfAwUKcGFnZWJ1dHRvbh8EAgIfBWdkZAIKDw8WCB8CBQE0HwMFCnBhZ2VidXR0b24fBAICHwVnZGQCDA8PFggfAgUBNR8DBQpwYWdlYnV0dG9uHwQCAh8FZ2RkAg4PDxYIHwIFATYfAwUKcGFnZWJ1dHRvbh8EAgIfBWdkZAIQDw8WCB8CBQE3HwMFCnBhZ2VidXR0b24fBAICHwVnZGQCEg8PFggfAgUBOB8DBQpwYWdlYnV0dG9uHwQCAh8FZ2RkAhQPDxYIHwIFATkfAwUKcGFnZWJ1dHRvbh8EAgIfBWdkZAIWDw8WCB8CBQIxMB8DBQpwYWdlYnV0dG9uHwQCAh8FZ2RkAhsPDxYCHwIFCyZuYnNwWzEvODNdZGQYAQUeX19Db250cm9sc1JlcXVpcmVQb3N0QmFja0tleV9fFgcFGWN0bDAwJG1lbnVfbmF2MSRpYnRTZWFyY2gFLmN0bDAwJENvbnRlbnRQbGFjZUhvbGRlcjEkZGF0YV9saXN0MSRpYnRTZWFyY2gFMWN0bDAwJENvbnRlbnRQbGFjZUhvbGRlcjEkZGF0YV9saXN0MSRpYnRTZWFyY2hBbGwFPmN0bDAwJENvbnRlbnRQbGFjZUhvbGRlcjEkZGF0YV9saXN0MSRXZWJQYWdlTmF2aWdhdG9yVjIxJGN0bDA2BT5jdGwwMCRDb250ZW50UGxhY2VIb2xkZXIxJGRhdGFfbGlzdDEkV2ViUGFnZU5hdmlnYXRvclYyMSRjdGwwOAU+Y3RsMDAkQ29udGVudFBsYWNlSG9sZGVyMSRkYXRhX2xpc3QxJFdlYlBhZ2VOYXZpZ2F0b3JWMjEkY3RsMzAFPmN0bDAwJENvbnRlbnRQbGFjZUhvbGRlcjEkZGF0YV9saXN0MSRXZWJQYWdlTmF2aWdhdG9yVjIxJGN0bDMyuFjgj5nepdWXkOAwNYww+divJYtYSrYgHZpTcewu9Ds=',
                       '__VIEWSTATEGENERATOR' : 'E95FE49A',
                       '__EVENTVALIDATION' : '/wEdACHcOKX2MiW8o3JKug67fnRBm/LuJNf32p7npb2HQkdSHj2jQIPNrpQqFhY2rmhcQzOr90YGqna/Dtr3eCnJKH/FRrctoJJXOcc5nzwqquFEKe/f6ybfmfBBwP5V9TZX05svUiuWBMoi40eiFXgXu/HvnPjbm91I+Oz3HACj/rejcfKu91e/rwNa3qahKk8QP//P3Ctl3lcnXTxti+MHToVFJ4X5e7akN9M5YNbryOCPFUzWTSqkhEUajNOJze2BA47TqM8vDP0IP5ki4KWYQixH1ITUrNZx490LfBrUZBBPZp6DDFbb0FBaxN5KpyeciB3wOyFRvNC7wvyrzR4zZIFKvsDwEoIoZw4QpAfkYvtGlm/erM6tYMUIO2Y+EofXRtI5fpcvmMZwp9oWz1DjjMQ7kMX3NKB1EbRuWhW/PUV26RCgECz38VETCqQlHmY2JJfazoydmTWb206Gy1R0dPzbnPz5BKeIBWlSOZDH/jTFFrzBKTtWpKGoPFsObJHPJ/aat3bwhGesAEcXWRHlLMcB7+Yj6K/9RPZv/XJ9M8z/IAbi3aAtkyVcWc7DpsPsia8+XWZOcmYS4tf4O30N13XKSyM1xB3zywxlTxuxx1lP5+GDugiF+Yf+KojuR7Az4t0LDho3RsEd/ZN7ejUxBtxfh6oqlZNMy4/Raz+OSUeRTRVfoUMGNPEUTwp88pek/ycTkyMA26w5UfW8JGdFRvrmOA59JlLF9OIGGWESn/RCnw==',
                       'ctl00$agentPlatform' : '1',
                       'ctl00$menu_nav1$tbxSearchWord' : '',
                       'ctl00$ContentPlaceHolder1$data_list1$ddlSearchItem' : '240/241/',
                       'ctl00$ContentPlaceHolder1$data_list1$tbxSearch' : '',
                       'ctl00$ContentPlaceHolder1$data_list1$hdnSearchText':'',
                       'ctl00$ContentPlaceHolder1$data_list1$hdnSearchPath' : '240/241/',
                       'ctl00$ContentPlaceHolder1$data_list1$WebPageNavigatorV21$ctl00' : '0',
                       'ctl00$ContentPlaceHolder1$data_list1$WebPageNavigatorV21$ctl01' : '1',
                       'ctl00$ContentPlaceHolder1$data_list1$WebPageNavigatorV21$ctl02' : '821'
                       }
            r = requests.post('http://www.kif.re.kr/kif2/publication/pub_list.aspx?menuid=17', data=payload)

            print(r.text)
            return

如何进入下一页?

1 个答案:

答案 0 :(得分:0)

<Response>
  <Dial>ag2_num</Dial>
  <Redirect>disconnectedcallurl-usingemptyqueue(todisconnectthefirstagent)</Redirect>
</Response> 

这可以控制页码。而且它是零基础的。如果你想转到第2页,而不是将其改为1.

你应该把它变成一个变量:

function disconnect_call($callsid){
        $rr = array("status" => "completed");

        $call = $this->client->calls($callsid)->update($rr);
        echo $call->direction;
    }