如何通过自定义功能输出熊猫数据框?

时间:2020-04-15 02:43:21

标签: python pandas function

我要从Craiglist中抓取四个结构完全相同的不同页面。为了加快这一步,我编写了一个执行此操作的函数,该函数因此会产生pandas df。我知道functio可以工作,因为我在其中包含了可产生期望输出的打印语句。但是,当我尝试使用应该在以后的代码中输出管道的数据集时,出现错误消息:

NameError: name 'out_df' is not defined

这是我函数的代码:

#create function to grab the posts
def grab_posts(response, end_value):   
    html_soup = BeautifulSoup(response.text, 'html.parser')
    posts = html_soup.find_all('li', class_ = 'result-row')
#     return posts_i
    print(type(posts))
    print(len(posts))
    #get item description, item price, and listing dates for each item
    items = []
    prices = []
    dates = []
    for i in range(end_value):
        items.append(posts[i].find(class_ = 'result-title hdrlnk').text)
        prices.append(posts[i].find(class_ = 'result-price').text)
        dates.append(posts[i].find(class_ = 'result-date').text)
    print(len(items))
    #create Series from items
    items_col = pd.Series(items)
    #create Series from prices
    prices_col = pd.Series(prices)
    #create Series from dates
    dates_col = pd.Series(dates)
    print(type(dates_col))
    #concatenate
    out_df = pd.concat([items_col, prices_col, dates_col], axis = 1)
    print(out_df.head())
    out_df.rename(columns = {
                0: 'item_description',
                1: 'price',
                2: 'date_listed'
            }, inplace = True)
    print(out_df.head())
    print(type(out_df))
    return out_df

该函数似乎起作用了,因为print语句产生了我所期望的(一直到打印数据框)。参见下图。

Output of the function

但是,像这样的简单命令

type(out_df)

产生了我上面提到的错误。谁能阐明为什么会这样,以及如何将函数中的数据帧生成到jupyter笔记本的内存中?

1 个答案:

答案 0 :(得分:0)

如果要在函数外部调用它:
您必须将函数的返回值分配给变量。喜欢:

out_df = grab_posts(response, end_value)
type(out_df)

如果在函数内:
尝试将其声明为全局

#create function to grab the posts
def grab_posts(response, end_value):   
    html_soup = BeautifulSoup(response.text, 'html.parser')
    posts = html_soup.find_all('li', class_ = 'result-row')
#     return posts_i
    print(type(posts))
    print(len(posts))
    #get item description, item price, and listing dates for each item
    items = []
    prices = []
    dates = []
    for i in range(end_value):
        items.append(posts[i].find(class_ = 'result-title hdrlnk').text)
        prices.append(posts[i].find(class_ = 'result-price').text)
        dates.append(posts[i].find(class_ = 'result-date').text)
    print(len(items))
    #create Series from items
    items_col = pd.Series(items)
    #create Series from prices
    prices_col = pd.Series(prices)
    #create Series from dates
    dates_col = pd.Series(dates)
    print(type(dates_col))
    #concatenate
    global out_df
    out_df = pd.concat([items_col, prices_col, dates_col], axis = 1)
    print(out_df.head())
    out_df.rename(columns = {
                0: 'item_description',
                1: 'price',
                2: 'date_listed'
            }, inplace = True)
    print(out_df.head())
    print(type(out_df))
    return out_df