这是我的第一篇文章
我有一些html链接,我想找到一些特定的文本,这也是下一个文本。我正在使用正则表达式,但是却丢失了空列表。
这些是链接:
https://www.99acres.com/mailers/mmm_html/eden-park-14mar2017-558.html https://www.99acres.com/mailers/mmm_html/ats-golf-meadows-13april-2016.html https://www.99acres.com/mailers/mmm_html/spaze-privy-the-address-10mar2017-553.html
我正在寻找的文字 区域范围:下一个文本 拥有:下一个文本也例如拥有2019 价格:也下一个文字
下面是我的代码:
import requests
from bs4 import BeautifulSoup
import csv
import json
import itertools
import re
file = {}
final_data = []
final = []
textdata = []
def readfile(alldata, filename):
with open("./"+filename, "w") as csvfile:
csvfile = csv.writer(csvfile, delimiter=",")
for i in range(0, len(alldata)):
csvfile.writerow(alldata[i])
def parsedata(url, values):
r = requests.get(url, values)
data = r.text
return data
def getresults():
global final_data, file
with open("Mailers.csv", "r") as f:
reader = csv.reader(f)
next(reader)
for row in reader:
ids = row[0]
link = row[1]
html = parsedata(link, {})
soup = BeautifulSoup(html, "html.parser")
titles = soup.title.text
td = soup.find_all("td")
for i in td:
sublist = []
data = i.text
pattern = r'(Possession:)(.)(.+)'
x1 = re.findall(pattern, data)
sublist.append(x1)
sublist.append(link)
final_data.append(sublist)
print(final_data)
return final_data
def main():
getresults()
readfile(final_data, "Data.csv")
main()