我要从Craiglist中抓取四个结构完全相同的不同页面。为了加快这一步,我编写了一个执行此操作的函数,该函数因此会产生pandas df。我知道functio可以工作,因为我在其中包含了可产生期望输出的打印语句。但是,当我尝试使用应该在以后的代码中输出管道的数据集时,出现错误消息:
NameError: name 'out_df' is not defined
这是我函数的代码:
#create function to grab the posts
def grab_posts(response, end_value):
html_soup = BeautifulSoup(response.text, 'html.parser')
posts = html_soup.find_all('li', class_ = 'result-row')
# return posts_i
print(type(posts))
print(len(posts))
#get item description, item price, and listing dates for each item
items = []
prices = []
dates = []
for i in range(end_value):
items.append(posts[i].find(class_ = 'result-title hdrlnk').text)
prices.append(posts[i].find(class_ = 'result-price').text)
dates.append(posts[i].find(class_ = 'result-date').text)
print(len(items))
#create Series from items
items_col = pd.Series(items)
#create Series from prices
prices_col = pd.Series(prices)
#create Series from dates
dates_col = pd.Series(dates)
print(type(dates_col))
#concatenate
out_df = pd.concat([items_col, prices_col, dates_col], axis = 1)
print(out_df.head())
out_df.rename(columns = {
0: 'item_description',
1: 'price',
2: 'date_listed'
}, inplace = True)
print(out_df.head())
print(type(out_df))
return out_df
该函数似乎起作用了,因为print语句产生了我所期望的(一直到打印数据框)。参见下图。
但是,像这样的简单命令
type(out_df)
产生了我上面提到的错误。谁能阐明为什么会这样,以及如何将函数中的数据帧生成到jupyter笔记本的内存中?
答案 0 :(得分:0)
如果要在函数外部调用它:
您必须将函数的返回值分配给变量。喜欢:
out_df = grab_posts(response, end_value)
type(out_df)
如果在函数内:
尝试将其声明为全局
#create function to grab the posts
def grab_posts(response, end_value):
html_soup = BeautifulSoup(response.text, 'html.parser')
posts = html_soup.find_all('li', class_ = 'result-row')
# return posts_i
print(type(posts))
print(len(posts))
#get item description, item price, and listing dates for each item
items = []
prices = []
dates = []
for i in range(end_value):
items.append(posts[i].find(class_ = 'result-title hdrlnk').text)
prices.append(posts[i].find(class_ = 'result-price').text)
dates.append(posts[i].find(class_ = 'result-date').text)
print(len(items))
#create Series from items
items_col = pd.Series(items)
#create Series from prices
prices_col = pd.Series(prices)
#create Series from dates
dates_col = pd.Series(dates)
print(type(dates_col))
#concatenate
global out_df
out_df = pd.concat([items_col, prices_col, dates_col], axis = 1)
print(out_df.head())
out_df.rename(columns = {
0: 'item_description',
1: 'price',
2: 'date_listed'
}, inplace = True)
print(out_df.head())
print(type(out_df))
return out_df