将多个列表转换为dataframe python

时间:2017-07-04 05:14:50

标签: python list pandas

我想将列表放在数据框中,我的代码是,

webpage_urls = ["https://data.gov.au/dataset?q=&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&organization=departmentofagriculturefisheriesandforestry&_groups_limit=0",
                 "https://data.gov.au/dataset?q=&organization=commonwealthscientificandindustrialresearchorganisation&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&_groups_limit=0",
                 "https://data.gov.au/dataset?q=&organization=bureauofmeteorology&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&_groups_limit=0",
                 "https://data.gov.au/dataset?q=&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&organization=tasmanianmuseumandartgallery&_groups_limit=0",
                 "https://data.gov.au/dataset?q=&organization=department-of-industry&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&_groups_limit=0"]

    for i in webpage_urls:
        wiki2 = i
        page= urllib.request.urlopen(wiki2)

        soup = BeautifulSoup(page)

        # fetching organisations

        data3 = soup.find_all('li', class_="nav-item active")

        lobbying1 = []
        for element in data3:
            lobbying1.append(element.span.get_text())
        print(lobbying1)

        df = pd.DataFrame({'Organisation':lobbying1})   

上面的代码输出为:

['Reserve Bank of Aus... (24)', 'Business Support an... (24)']
['Department of Finance (16)', 'Business Support an... (16)']
['Department of Agric... (13)', 'Business Support an... (13)']...so on

这是多个列表,而不是嵌套的列表,我只得到以下数据框:

   Organisation
0  Australian Charitie... (1)
1  Business Support an... (1)

我想在第1列的列中输出列表的第一个元素,在第2列中列表的第二个元素,我想要所有条目:

Organisation            Groups
Australian Cha...      Business Support and...

帮助我解决这个问题。

2 个答案:

答案 0 :(得分:1)

我认为您需要为[]添加list of lists,然后使用DataFrame构造函数:

    df = pd.DataFrame([lobbying1], columns=['Organization','Groups'])   
    print (df)

                  Organization        Groups
0  Department of Agric... (35)  Science (35)
                 Organization       Groups
0  Commonwealth Scient... (8)  Science (8)
                Organization       Groups
0  Bureau of Meteorology (4)  Science (4)
                 Organization       Groups
0  Tasmanian Museum an... (1)  Science (1)
                 Organization       Groups
0  Department of Indus... (1)  Science (1)

如果所有数据都需要一个DataFramelobbying1添加到data列表,然后将DataFrame构造函数调出循环:

data = []
for i in webpage_urls:
    wiki2 = i
    page= urllib.request.urlopen(wiki2)

    soup = BeautifulSoup(page)
    # fetching organisations
    data3 = soup.find_all('li', class_="nav-item active")

    lobbying1 = []
    for element in data3:
        lobbying1.append(element.span.get_text())
    data.append(lobbying1)

df = pd.DataFrame(data, columns=['Organization','Groups'])   
print (df)
                  Organization        Groups
0  Department of Agric... (35)  Science (35)
1   Commonwealth Scient... (8)   Science (8)
2    Bureau of Meteorology (4)   Science (4)
3   Tasmanian Museum an... (1)   Science (1)
4   Department of Indus... (1)   Science (1)

答案 1 :(得分:0)

您的列表lobbying1是列表清单。因此,您只需按以下方式调用pd.Dataframe即可获得两列数据框:

lobbying1 = [['Reserve Bank of Aus... (24)', 'Business Support an... (24)'],
['Department of Finance (16)', 'Business Support an... (16)'],
['Department of Agric... (13)', 'Business Support an... (13)']]
df = pd.DataFrame(main_list, columns=['Organization','Groups'])

你得到这个作为输出

>>> df.head() 
                  Organization                       Groups
0  Reserve Bank of Aus... (24)  Business Support an... (24)
1   Department of Finance (16)  Business Support an... (16)
2  Department of Agric... (13)  Business Support an... (13)
>>>