我正在尝试将数据从其他URL附加到Pandas DataFrame,但是似乎循环无法正常工作。任何人都没有想法如何做到这一点。 只是一个附加信息,网络将使用数字更改下一个网址。
import requests
import pandas
import numpy
import re
import csv
from bs4 import BeautifulSoup
#### page info ###
for k in range (1,3):
k=str(k)
page = requests.get("https://postcode.my/search/?keyword=&state=Kedah&page="+k)
#### check page status (will come 200 if the page is ok)
page.status_code
### call Library
soup = BeautifulSoup(page.content, 'html.parser')
### Find rows
rows = soup.find_all(class_="col-lg-12 col-md-12 col-sm-12 col-xs-12")
## create list by append
L=[]
for row in rows:
cols = row.find_all("td")
cols = [x.text.strip() for x in cols]
L.append(cols)
## convert to numpy array and reshape to 4 columns
cols = ['LOCATION','AREA','STATE','POSTCODE']
PDTABLE = pandas.DataFrame(numpy.array(L).reshape(-1,4),columns = cols)
print(PDTABLE)
##PDTABLE.to_csv('test.csv')
谢谢 最好的祝福 莱莉·沙里尔
答案 0 :(得分:0)
在循环中分配变量时,它将在第二个循环运行时被替换。
尝试:
import requests
import pandas
import numpy
import re
import csv
L = []
from bs4 import BeautifulSoup
#### page info ###
for k in range (1,3):
k=str(k)
page = requests.get("https://postcode.my/search/?keyword=&state=Kedah&page="+k)
#### check page status (will come 200 if the page is ok)
page.status_code
### call Library
soup = BeautifulSoup(page.content, 'html.parser')
### Find rows
rows = soup.find_all(class_="col-lg-12 col-md-12 col-sm-12 col-xs-12")
## create list by append
for row in rows:
cols = row.find_all("td")
cols = [x.text.strip() for x in cols]
L.append(cols)
## convert to numpy array and reshape to 4 columns
cols = ['LOCATION','AREA','STATE','POSTCODE']
PDTABLE = pandas.DataFrame(numpy.array(L).reshape(-1,4),columns = cols)
PDTABLE.to_csv('test.csv')