Python-pandas中的多索引框架

时间:2018-02-02 15:18:07

标签: python pandas multi-index

提前感谢您的帮助。

我正在尝试对网络剪贴板进行编程以从网站收集数据。我尝试将这些数据保存在一个框架中。但是,每当剪贴板更改页面时,它都会再次写入数据框,并且不会保存以前收集的信息。

任何想法如何解决这个问题?

   j=1
    i=0
    try:  
        url_next = str('https://www.viamichelin.es/web/Restaurantes/Restaurantes-Espana?page='+str(j))
        print(url_next)
        while(url_next is not None):
            url_next = str('https://www.viamichelin.es/web/Restaurantes/Restaurantes-Espana?page='+str(j))
            fetchURL(url_next)
            print (url_next)           
            data = pd.DataFrame({'site':'MICHELIN','name': '', 'pdv_url': '', 'rating': '','address' : '','web' : '','phone' : '','map' : '','email' : '','id' : '', 'datetime' : ''}, index = [j,i])
            i=0
            url=url_next
            while(url is not None):

                    html = fetchURL(url_next)
                    soup = BeautifulSoup(html, "html.parser")

                    card_list = soup.find_all("li", class_="poi-item-restaurant")
                    # print(card_list)

                    for card in card_list[0:len(card_list)]:
                       # card =card_list[0]                
                        name = card.find('div','poi-item-name truncate').get_text('title')
                        pdv_url = url_next+card.find('a').get('href')
                        html = fetchURL(pdv_url)
                        soup = BeautifulSoup(html, "html.parser")
                        #dataOutput = {'site':'MICHELIN','pdv_data':{'name': name, 'pdv_url': pdv_url, 'rating': str(rating)}}
                        #print(json.dumps(dataOutput))
                        #pdv_data = pd.DataFrame({'name': name, 'pdv_url': pdv_url, 'rating': str(rating)},index = [0])
                        data.iloc[i]['name'] = name
                        data.iloc[i]['pdv_url'] = pdv_url                         
                        i=i+1
                        print(data)
                        data = data.append(pd.DataFrame({'site':'MICHELIN','name': '', 'pdv_url': '', 'rating': '','address' : '','web' : '','phone' : '','map' : '','email' : '','id' : '', 'datetime' : ''}, index = [j,i]), ignore_index = True)
                    # print(soup)
                    # address = re.sub('  +','',soup.find(class_='address-t-record').get_text().replace('\n', '').replace('Ver mapa',''))
                    # print(soup.find('div', class_='highlighted-box-right').find_all('p'))
                    # ficha = soup.find('div', class_='tab-container bb-tab-container active').find_all('div', class_='data')
                    # for campo in ficha:
                    #     print(campo.find('span',class_='first').get_text())

                    # price = soup.find_all(class_='right').get_text()
                    # price_menu = soup.find_all(class_='right').get_text()

                    url = None
                    # print(name)
            j=j+1
    except AttributeError:
        url_next = None

1 个答案:

答案 0 :(得分:-1)

您将数据框与列表和词典混淆。谷歌一下。 最好只创建一个列表: 在顶部创建一个空列表:data =[] 然后将数据框附加到它:data.append(pd.DatFrame...)

或者使用字典: data = {} 然后添加它: data ['somename'] = pd.DataFrame...