我想将列表放在数据框中,我的代码是,
webpage_urls = ["https://data.gov.au/dataset?q=&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&organization=departmentofagriculturefisheriesandforestry&_groups_limit=0",
"https://data.gov.au/dataset?q=&organization=commonwealthscientificandindustrialresearchorganisation&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&_groups_limit=0",
"https://data.gov.au/dataset?q=&organization=bureauofmeteorology&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&_groups_limit=0",
"https://data.gov.au/dataset?q=&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&organization=tasmanianmuseumandartgallery&_groups_limit=0",
"https://data.gov.au/dataset?q=&organization=department-of-industry&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&_groups_limit=0"]
for i in webpage_urls:
wiki2 = i
page= urllib.request.urlopen(wiki2)
soup = BeautifulSoup(page)
# fetching organisations
data3 = soup.find_all('li', class_="nav-item active")
lobbying1 = []
for element in data3:
lobbying1.append(element.span.get_text())
print(lobbying1)
df = pd.DataFrame({'Organisation':lobbying1})
上面的代码输出为:
['Reserve Bank of Aus... (24)', 'Business Support an... (24)']
['Department of Finance (16)', 'Business Support an... (16)']
['Department of Agric... (13)', 'Business Support an... (13)']...so on
这是多个列表,而不是嵌套的列表,我只得到以下数据框:
Organisation
0 Australian Charitie... (1)
1 Business Support an... (1)
我想在第1列的列中输出列表的第一个元素,在第2列中列表的第二个元素,我想要所有条目:
Organisation Groups
Australian Cha... Business Support and...
帮助我解决这个问题。
答案 0 :(得分:1)
我认为您需要为[]
添加list of lists
,然后使用DataFrame
构造函数:
df = pd.DataFrame([lobbying1], columns=['Organization','Groups'])
print (df)
Organization Groups
0 Department of Agric... (35) Science (35)
Organization Groups
0 Commonwealth Scient... (8) Science (8)
Organization Groups
0 Bureau of Meteorology (4) Science (4)
Organization Groups
0 Tasmanian Museum an... (1) Science (1)
Organization Groups
0 Department of Indus... (1) Science (1)
如果所有数据都需要一个DataFrame
将lobbying1
添加到data
列表,然后将DataFrame
构造函数调出循环:
data = []
for i in webpage_urls:
wiki2 = i
page= urllib.request.urlopen(wiki2)
soup = BeautifulSoup(page)
# fetching organisations
data3 = soup.find_all('li', class_="nav-item active")
lobbying1 = []
for element in data3:
lobbying1.append(element.span.get_text())
data.append(lobbying1)
df = pd.DataFrame(data, columns=['Organization','Groups'])
print (df)
Organization Groups
0 Department of Agric... (35) Science (35)
1 Commonwealth Scient... (8) Science (8)
2 Bureau of Meteorology (4) Science (4)
3 Tasmanian Museum an... (1) Science (1)
4 Department of Indus... (1) Science (1)
答案 1 :(得分:0)
您的列表lobbying1
是列表清单。因此,您只需按以下方式调用pd.Dataframe
即可获得两列数据框:
lobbying1 = [['Reserve Bank of Aus... (24)', 'Business Support an... (24)'],
['Department of Finance (16)', 'Business Support an... (16)'],
['Department of Agric... (13)', 'Business Support an... (13)']]
df = pd.DataFrame(main_list, columns=['Organization','Groups'])
你得到这个作为输出
>>> df.head()
Organization Groups
0 Reserve Bank of Aus... (24) Business Support an... (24)
1 Department of Finance (16) Business Support an... (16)
2 Department of Agric... (13) Business Support an... (13)
>>>