提前感谢您的帮助。
我正在尝试对网络剪贴板进行编程以从网站收集数据。我尝试将这些数据保存在一个框架中。但是,每当剪贴板更改页面时,它都会再次写入数据框,并且不会保存以前收集的信息。
任何想法如何解决这个问题?
j=1
i=0
try:
url_next = str('https://www.viamichelin.es/web/Restaurantes/Restaurantes-Espana?page='+str(j))
print(url_next)
while(url_next is not None):
url_next = str('https://www.viamichelin.es/web/Restaurantes/Restaurantes-Espana?page='+str(j))
fetchURL(url_next)
print (url_next)
data = pd.DataFrame({'site':'MICHELIN','name': '', 'pdv_url': '', 'rating': '','address' : '','web' : '','phone' : '','map' : '','email' : '','id' : '', 'datetime' : ''}, index = [j,i])
i=0
url=url_next
while(url is not None):
html = fetchURL(url_next)
soup = BeautifulSoup(html, "html.parser")
card_list = soup.find_all("li", class_="poi-item-restaurant")
# print(card_list)
for card in card_list[0:len(card_list)]:
# card =card_list[0]
name = card.find('div','poi-item-name truncate').get_text('title')
pdv_url = url_next+card.find('a').get('href')
html = fetchURL(pdv_url)
soup = BeautifulSoup(html, "html.parser")
#dataOutput = {'site':'MICHELIN','pdv_data':{'name': name, 'pdv_url': pdv_url, 'rating': str(rating)}}
#print(json.dumps(dataOutput))
#pdv_data = pd.DataFrame({'name': name, 'pdv_url': pdv_url, 'rating': str(rating)},index = [0])
data.iloc[i]['name'] = name
data.iloc[i]['pdv_url'] = pdv_url
i=i+1
print(data)
data = data.append(pd.DataFrame({'site':'MICHELIN','name': '', 'pdv_url': '', 'rating': '','address' : '','web' : '','phone' : '','map' : '','email' : '','id' : '', 'datetime' : ''}, index = [j,i]), ignore_index = True)
# print(soup)
# address = re.sub(' +','',soup.find(class_='address-t-record').get_text().replace('\n', '').replace('Ver mapa',''))
# print(soup.find('div', class_='highlighted-box-right').find_all('p'))
# ficha = soup.find('div', class_='tab-container bb-tab-container active').find_all('div', class_='data')
# for campo in ficha:
# print(campo.find('span',class_='first').get_text())
# price = soup.find_all(class_='right').get_text()
# price_menu = soup.find_all(class_='right').get_text()
url = None
# print(name)
j=j+1
except AttributeError:
url_next = None
答案 0 :(得分:-1)
您将数据框与列表和词典混淆。谷歌一下。
最好只创建一个列表:
在顶部创建一个空列表:data =[]
然后将数据框附加到它:data.append(pd.DatFrame...)
或者使用字典:
data = {}
然后添加它:
data ['somename'] = pd.DataFrame...