Question

In the folwwing code:
import pandas as pd
import sqlite3
import math
import numpy
con = sqlite3.connect(r'C:\Python34\factbook.db')
facts = pd.read_sql_query('select * from facts;', con)
facts.dropna(inplace=True)
facts = facts[facts['area_land']!=0][:]
facts = facts[facts['population']!=0][:]
facts.reset_index(drop=True, inplace=True)
def pop_50(name):
    pop = facts[facts['name'] == name]['population']
    perc = facts[facts['name'] == name]['population_growth']
    new_pop = pop*(math.e**(35*perc))
    return new_pop


x=pd.Series(data=facts['name'])
z = x.apply(pop_50)

x是系列：

0                                        Afghanistan
1                                            Albania
2                                            Algeria
3                                            Andorra
4                                             Angola
5                                Antigua and Barbuda
6                                          Argentina
7                                            Armenia

依旧......

但是z不是。这是一个用于查看它是什么的链接（一个DataFrame）： https://www.scribd.com/document/357697929/Doc1

我不明白为什么。 pop_50函数返回一个结果（我测试了它），为什么zed是一个DataFrame？ pop_50如何返回一个系列？它需要一排（事实[＆＃39; name＆＃39;] == name），并从中调出一个值（在人口列下），而不是调用pop。它比perc的想法一样。 new_pop是2个singel值的数学组合，所以它也是一个单独的值，而func只返回那个，不是吗？

谢谢。

Answer 1

pop_50返回pd.Series。 x.apply(pop_50)为pop_50的每一行调用函数x，并将该行的值传递给pop_50作为参数name。因此，对于x中的第一行，您将返回一个系列。再次为第二排。你最终得到了一系列系列......这是一个数据帧。此外，x的索引将是结果的列。

请改为尝试：

facts2 = facts.set_index('name')

def pop_50(name):

    pop = facts2.at[name, 'population']
    perc = facts2.at[name, 'population_growth']
    new_pop = pop*(math.e**(35*perc))
    return new_pop

您也可以使用pd.Series.squeeze

def pop_50(name):
    pop = facts[facts['name'] == name]['population'].squeeze()
    perc = facts[facts['name'] == name]['population_growth'].squeeze()
    new_pop = pop*(math.e**(35*perc))
    return new_pop

如果因任何原因无法更改pop_50，请将其换成lambda

z = x.apply(lambda name: pop_50(name).squeeze())

apply（）返回DataFrame而不是Series

1 个答案: