将数据从其他站点追加到熊猫

时间:2018-09-28 09:22:09

标签: python pandas for-loop

我正在尝试将数据从其他URL附加到Pandas DataFrame,但是似乎循环无法正常工作。任何人都没有想法如何做到这一点。 只是一个附加信息,网络将使用数字更改下一个网址。

import requests
import pandas
import numpy
import re
import csv

from bs4 import BeautifulSoup
#### page info ###
for k in range (1,3):
    k=str(k)
    page = requests.get("https://postcode.my/search/?keyword=&state=Kedah&page="+k)
#### check page status (will come 200 if the page is ok) 
    page.status_code
### call Library
    soup = BeautifulSoup(page.content, 'html.parser')
### Find rows 
    rows = soup.find_all(class_="col-lg-12 col-md-12 col-sm-12 col-xs-12")
## create list by append
    L=[]
    for row in rows:
        cols = row.find_all("td")
        cols = [x.text.strip() for x in cols]
        L.append(cols)
## convert to numpy array and reshape to 4 columns 
        cols = ['LOCATION','AREA','STATE','POSTCODE']
        PDTABLE = pandas.DataFrame(numpy.array(L).reshape(-1,4),columns = cols)
        print(PDTABLE)
        ##PDTABLE.to_csv('test.csv')

谢谢 最好的祝福 莱莉·沙里尔

1 个答案:

答案 0 :(得分:0)

在循环中分配变量时,它将在第二个循环运行时被替换。

尝试:

import requests
import pandas
import numpy
import re
import csv

L = []

from bs4 import BeautifulSoup
#### page info ###
for k in range (1,3):
    k=str(k)
    page = requests.get("https://postcode.my/search/?keyword=&state=Kedah&page="+k)
#### check page status (will come 200 if the page is ok) 
    page.status_code
### call Library
    soup = BeautifulSoup(page.content, 'html.parser')
### Find rows 
    rows = soup.find_all(class_="col-lg-12 col-md-12 col-sm-12 col-xs-12")
## create list by append

    for row in rows:
        cols = row.find_all("td")
        cols = [x.text.strip() for x in cols]
        L.append(cols)

## convert to numpy array and reshape to 4 columns 
cols = ['LOCATION','AREA','STATE','POSTCODE']
PDTABLE = pandas.DataFrame(numpy.array(L).reshape(-1,4),columns = cols)
PDTABLE.to_csv('test.csv')