在socket.recv(1024)中,如何在其中搜索特定单词?

时间:2017-11-14 09:09:09

标签: python sockets

我有一个简单的横幅抓取器。如果我从Google手中抢过一条横幅,我会收到回复的信息。

我想在套接字数据中找到一个特定的单词?我怎么能这样做?

我的横幅抓取器:

import socket
import ipaddress

import socket
host = "www.google.nl"
port = 80
addr = (host, port)
s = socket.socket()
s.connect (addr)
s.send(b'GET / HTTP/1.1\n\n\n')
print(s.recv(1024))

我从Google.nl收到的横幅(s.recv(1024)):

  

b'HTTP/1.1 302 Found\r\nCache-Control: private\r\nContent-Type: text/html; charset=UTF-8\r\nReferrer-Policy: no-referrer\r\nLocation: http://www.google.nl/?gfe_rd=cr&dcr=0&ei=XrAKWpHeJ9GC3gOB_ZWoBg\r\nContent-Length: 268\r\nDate: Tue, 14 Nov 2017 08:59:10 GMT\r\n\r\n<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">\n<TITLE>302 Moved</TITLE></HEAD><BODY>\n<H1>302 Moved</H1>\nThe document has moved\n<A HREF="http://www.google.nl/?gfe_rd=cr&amp;dcr=0&amp;ei=XrAKWpHeJ9GC3gOB_ZWoBg">here</A>.\r\n</BODY></HTML>\r\n'

我想要的是什么:

If "Document" is in s.recv(1024):
print ("Document is found in the banner!")

Else:
print ("No keyword found")

2 个答案:

答案 0 :(得分:0)

如果你在python中编写伪代码if else构造(简单更改),将接收到的字节存储在变量中并天真地将它们转换为字符串,你可以继续编程:

import socket
host = "www.google.nl"
port = 80
addr = (host, port)
s = socket.socket()
s.connect(addr)
s.send(b'GET / HTTP/1.1\n\n\n')
b_text = s.recv(1024)
print(b_text)
if "Document" in str(b_text):
    print("Document is found in the banner!")
else:
    print("No keyword found")

如果你想要保留变量(我不知道用例),直接在s.recv()调用的结果上工作的代码片段是:

import socket
host = "www.google.nl"
port = 80
addr = (host, port)
s = socket.socket()
s.connect(addr)
s.send(b'GET / HTTP/1.1\n\n\n')
if "Document" in str(s.recv(1024)):
    print("Document is found in the banner!")
else:
    print("No keyword found")

答案 1 :(得分:0)

最好使用正确的工具来代替低级套接字。 requests是一个受欢迎的第三方库,用于发出HTTP请求。

>>> import requests
>>> r = requests.get('http://www.google.nl')
>>> r.headers
{'Date': 'Tue, 14 Nov 2017 17:13:33 GMT', 'Expires': '-1', 'Cache-Control': 'private, max-age=0', 'Content-Type': 'text/html; char
set=ISO-8859-1', 'P3P': 'CP="This is not a P3P policy! See g.co/p3phelp for more info."', 'Server': 'gws', 'Content-Length': '5129
', 'X-XSS-Protection': '1; mode=block', 'X-Frame-Options': 'SAMEORIGIN', 'Via': '1.1 jfdmzpr06', 'Connection': 'Keep-Alive', 'Cont
ent-Encoding': 'gzip', 'Set-Cookie': '1P_JAR=2017-11-14-17; expires=Thu, 14-Dec-2017 17:13:33 GMT; path=/; domain=.google.nl, NID=
117=Sa3F7Bq4oXkQdcBu5OXCM3AdfyGZbxABbYqFMenMm-Ru4nITdC8tQujRxLJPl3aUG8ksQM-uDF56jlDrk0Hm9KMVkbOcb51K0oyys0PFU3ZEaSS5TnzBGk_dOYmK4X
vS; expires=Wed, 16-May-2018 17:13:33 GMT; path=/; domain=.google.nl; HttpOnly'}
>>> r.status_code
200
>>> r.text[:1000]
'<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world\'s informa
tion, including webpages, images, videos and more. Google has many special features to help you find exactly what you\'re looking
for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><m
eta content="/logos/doodles/2017/131st-anniversary-of-the-hole-puncher-5763551741345792.3-law.gif" itemprop="image"><meta content=
"131st Anniversary of the Hole Puncher" property="twitter:title"><meta content="131st Anniversary of the Hole Puncher #GoogleDoodl
e" property="twitter:description"><meta content="131st Anniversary of the Hole Puncher #GoogleDoodle" property="og:description"><m
eta content="summary_large_image" property="twitter:card"><meta content="@GoogleDoodles" property="twitter:site"><meta content="ht
tps://www.google.com/logos/doodles/2017/131st-anniversary-of-the-hole-puncher-576355174134579'
>>> 'History' in r.text
True