[{'category': "Best restaurant that's been around forever and is still worth the trip", 'winner': ['Lula Cafe'], 'runners_up': ['Frontera Grill', 'Chicago Diner ', 'Sabatino’s', 'Twin Anchors']}] [{'category': 'Best fancy restaurant in Chicago', 'winner':
['Alinea '], 'runners_up': ['Blackbird', 'Girl & the Goat', 'Green Zebra', 'The Publican']}] [{'category': 'Best bang for your buck', 'winner': ['Big Star', 'Sultan’s Market'], 'runners_up': ['Frasca Pizzeria & Wine Bar', 'Chutney Joe’s', '"My boyfriend!"']}]
[{'category': 'Best chef', 'winner': ['Rick Bayless (Frontera Grill, Topolobampo, Xoco)'], 'runners_up': ['Grant Achatz (Alinea, Next, The Aviary)', 'Stephanie Izard (Girl & the Goat)']}]

I am expecting a dataframe with column names as category, winner and runner's up respectively and entries into subsequent columns.Any suggestions
here is the code:
Im basically trying to scrape a web page with beautiful Soup.(though jus a beginner)
def make_soup(url):
page = requests.get(url)
return BeautifulSoup(page.content,'lxml')
# function to get all the categories corresponding to a url
def get_category(section_url):
soup = make_soup(section_url)
boccat = soup.find('dl','boccat')
category_links = [base_url + dd.a['href'] for dd in boccat.find_all('dd')]
return category_links
#function to print winner and runner's up pertaining to each category
def category_winner(category_url):
soup = make_soup(category_url)
category = soup.find('h1','headline').string
winner = [h2.string for h2 in soup.findAll("h2", "boc1")]
runners_up = [h2.string for h2 in soup.findAll("h2", "boc2")]
return {'category' : category,
'winner' : winner,
'runners_up' : runners_up}
# url for which the winners are to be found
food_n_drink = ('https://www.chicagoreader.com/chicago/best-of-chicago-2011-
food-drink/BestOf?oid=4106228')
categories = get_category(food_n_drink)
data = []
for cat in categories:
winner = category_winner(cat)
data.append(winner)
print(data)
最后一行代码给出了输出,即多个列表,我在问题中分享了前4个。我的目标是从输出中创建一个数据帧以使其使用
答案 0 :(得分:0)
如果k
是以逗号分隔的列表列表:
[{'category': "Best restaurant that's been around forever and is still worth the trip", 'winner': ['Lula Cafe'], 'runners_up': ['Frontera Grill', 'Chicago Diner ', 'Sabatino’s', 'Twin Anchors']}] , [{'category': 'Best fancy restaurant in Chicago', 'winner':['Alinea '], 'runners_up': ['Blackbird', 'Girl & the Goat', 'Green Zebra', 'The Publican']}] , [{'category': 'Best bang for your buck', 'winner': ['Big Star', 'Sultan’s Market'], 'runners_up': ['Frasca Pizzeria & Wine Bar', 'Chutney Joe’s', '"My boyfriend!"']}] , [{'category': 'Best chef', 'winner': ['Rick Bayless (Frontera Grill, Topolobampo, Xoco)'], 'runners_up': ['Grant Achatz (Alinea, Next, The Aviary)', 'Stephanie Izard (Girl & the Goat)']}]
然后
emptydict = {}
diction = {}
df = pd.DataFrame.from_dict(emptydict, orient='index')
df = df.T
for i in k:
for j in i:
for key, value in j.items():
diction[key] = value
df = df.append(diction, ignore_index=True, verify_integrity=False)
将完成这项工作。
答案 1 :(得分:-1)
您可以从字典列表或列表列表中创建pandas数据框。您的输出是包含在单独列表中的单独词典。如果您将它们定义为词典或列表,或者词典或列表的列表,您可以从它们创建一个df。
重新格式化输入:
d1 = {'category': "Best restaurant that's been around forever and is still worth the trip",
'winner': ['Lula Cafe'],
'runners_up': ['Frontera Grill', 'Chicago Diner ', 'Sabatino’s', 'Twin Anchors']}
d2 = {'category': 'Best fancy restaurant in Chicago',
'winner': ['Alinea '],
'runners_up': ['Blackbird', 'Girl & the Goat', 'Green Zebra', 'The Publican']}
d3 = {'category': 'Best bang for your buck',
'winner': ['Big Star', 'Sultan’s Market'],
'runners_up': ['Frasca Pizzeria & Wine Bar', 'Chutney Joe’s', '"My boyfriend!"']}
d4 = {'category': 'Best chef',
'winner': ['Rick Bayless (Frontera Grill, Topolobampo, Xoco)'],
'runners_up': ['Grant Achatz (Alinea, Next, The Aviary)', 'Stephanie Izard (Girl & the Goat)']}
创建df:
pd.DataFrame([d1, d2, d3, d4])