Question

有没有一种方法可以从craigslist上的列表中查找电子邮件，而无需使用selenium

import requests,re
from bs4 import BeautifulSoup as bs
url='https://newyork.craigslist.org/wch/prk/d/hawthorne-10x15-drive-up-storage-unit/7122801839.html' #example url
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'}
res=requests.get(url,headers=headers)

每个请求的电子邮件都会更改（我假设是这样），我尝试了x=re.findall('(\w{32})',res.text)，但是它不起作用

Answer 1

Craigslist通过向该特殊URL发送POST请求来获取电子邮件地址：

https://newyork.craigslist.org/contactinfo/nyc/prk/U_ID

在这种情况下，此U_ID的值为7122801839（根据您提供的URL）。

您可以这样复制此请求：

from bs4 import BeautifulSoup
import requests
import json

U_ID = "7122801839"

URL = f"https://newyork.craigslist.org/contactinfo/nyc/prk/{U_ID}"

COOKIE_VALUE = "cookie" # Replace this with a valid cookie
HEADERS = { 
'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',
'Accept':'*/*',
'Accept-Language':'en-us',
'Accept-Encoding':'gzip, deflate, br',
'Host':'newyork.craigslist.org',
'Origin':'https',
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Safari/605.1.15',
'Connection':'keep-alive',
'Referer':'https',
'Content-Length':'44816',
'Cookie':COOKIE_VALUE,
'X-Requested-With':'XMLHttpRequest',
 }


PAYLOAD = {
'MIME Type':'application/x-www-form-urlencoded; charset=UTF-8',
}


response = requests.request(
    method='POST',
    url=URL,
    headers=HEADERS,
    data=PAYLOAD
    )

html = json.loads(response.text)['replyContent']

soup = BeautifulSoup(html,'html.parser')

email = soup.find(class_='mailapp').get('href')
email = email.split('?subject')[0].replace('mailto:','')

print(email)

请注意，如果没有cookie，此代码将无法工作，因此您需要从浏览器中复制cookie。

从craiglists帖子中提取电子邮件

1 个答案: