将熊猫数据框添加到Google表格中

时间:2020-10-23 15:45:50

标签: python pandas dataframe google-sheets beautifulsoup

我正在使用python,BeautifulSoup,pandas和Google表格创建一个网络抓取程序。 到目前为止,我已经设法从Google表格的列表中的几个网页中抓取数据表。我想要实现的是,对于每个URL中的每个表,我想要创建一个数据框。

到目前为止,我的代码如下:

import gspread
from google.oauth2 import service_account
from google.auth.transport.requests import AuthorizedSession
import pandas as pd
from bs4 import BeautifulSoup
import requests


credentials = service_account.Credentials.from_service_account_file(
    'credentials.json')

scoped_credentials = credentials.with_scopes(
        ['https://spreadsheets.google.com/feeds',
         'https://www.googleapis.com/auth/drive']
        )

gc = gspread.Client(auth=scoped_credentials)
gc.session = AuthorizedSession(scoped_credentials)
sheet = gc.open_by_key('api_key')

worksheet = sheet.sheet1
link_list = worksheet.col_values(2)


def get_info(page_url) :

    page = requests.get(page_url)
    soup = BeautifulSoup(page.content, 'html.parser')

    try :
        tbl = soup.find('table')

        labels = []
        data = []

        for tr in tbl.findAll('tr'):
            imp_labels = [th.text.strip() for th in tr.findAll('th')]
            imp_data = [td.text.strip() for td in tr.findAll('td')]
            labels.append(imp_labels)
            data.append(imp_data)
        
        col_names = {'Labels': imp_labels, 'Data': imp_data}

        df = pd.DataFrame([labels, data], col_names)
        
        df_t = df.T
        print(df_t)

    except Exception as e:
        print(e)


for link in link_list :
    get_info(link)

输出:

                    Labels                                     Data
0        [Celebrated Name]                              [Don Lemon]
1                    [Age]                               [54 Years]
2              [Nick Name]                              [Don Lemon]
3             [Birth Name]                              [Don Lemon]
4             [Birth Date]                             [1966-03-01]
5                 [Gender]                                   [Male]
6             [Profession]                             [Journalist]
7           [Birth Nation]                          [United States]
8         [Place Of Birth]  [Baton Rouge, Louisiana, United States]
9            [Nationality]                               [American]
10              [Siblings]                 [Leisa Lemon, Yma Lemon]
11             [Ethnicity]                                  [Mixed]
12             [Eye Color]                                  [Brown]
13            [Hair Color]                                  [Black]
14              [Religion]                              [Christian]
15                [Height]                          [5 Feet 6 Inch]
16                [Weight]                              [Not Known]
17           [Working For]                                    [CNN]
18        [Best Known For]                            [CNN Tonight]
19                [School]                      [Baker High School]
20  [College / University]                        [Brookyn College]
21            [University]             [Louisiana State University]
22             [Horoscope]                                 [Pisces]
23             [Net Worth]               [$ 3 million (As of 2018)]
24            [Famous For]  [For hosting the program ‘CNN Tonight’]
25      [Body Measurement]                               [40-32-35]
26                [Awards]                             [Emmy Award]
27                [Salary]                                [$125000]
28                 [Links]      [WikipediaFacebookTwitterInstagram]
                  Labels                                               Data
0      [Celebrated Name]                                         [2 Chainz]
1                  [Age]                                         [43 Years]
2            [Nick Name]                              [Tity Boi, Drenchgod]
3           [Birth Name]                                     [Tauheed Epps]
4           [Birth Date]                                       [1977-09-12]
5               [Gender]                                             [Male]
6           [Profession]                                           [Rapper]
7       [Place Of Birth]             [College Park, Georgia, United States]
8          [Nationality]                                         [American]
9            [Ethnicity]                                    [Afro-American]
10           [Horoscope]                                            [Virgo]
11         [High School]                        [North Clayton High School]
12          [University]  [Alabama State University and Virginia State U...
13      [Marital Status]                                          [Married]
14                [Wife]                                       [Kesha Ward]
15            [Children]                        [Heaven, Harmony, and Halo]
16     [Body Build/Type]                                         [Athletic]
17    [Body Measurement]                                  [43-15-34 inches]
18          [Chest Size]                                        [43 inches]
19          [Bicep Size]                                        [15 inches]
20          [Waist Size]                                        [34 inches]
21           [Shoe Size]                                           [14 (US]
22              [Height]                                  [6 feet 5 inches]
23              [Weight]                                            [88 kg]
24           [Net Worth]                                      [$ 6 Million]
25              [Salary]                                        [$ 100,000]
26  [Sexual Orientation]                                         [Straight]
27           [Eye Color]                                       [Dark Brown]
28          [Hair Color]                                            [Black]
29               [Links]             [Wikipedia,Instagram,Twitter,Facebook]

所以,我的问题是:

  • 如何将数据框附加到Google表格?
  • 如何分离数据框(并排对齐)
  • 如何删除索引和方括号?

我是Python的新手,如果它有点混乱,我深表歉意。预先感谢。

0 个答案:

没有答案