Question

我有一个这样的数据集：-

S.No.,Year of birth,year of death
1,  1,  5
2,  3,  6
3,  2,  -
4,  5,  7

我需要计算直到那几年的人口：-

year,population
1   1
2   2
3   3
4   3
5   4
6   3
7   2
8   1

我该如何解决熊猫问题？由于我在大熊猫方面不擅长。任何帮助将不胜感激。

Answer 1

首先需要选择最大年份year of death（如果不存在），在解决方案中使用8。

然后将year of death的值转换为数字，并在今年之前替换缺失的值。在第一种解决方案中，使用Index.repeat和GroupBy.cumcount的birth和death列与Series.value_counts之间的差异，因为使用的是：

#if need working with years
#today_year = pd.to_datetime('now').year

today_year = 8
df['year of death'] = pd.to_numeric(df['year of death'], errors='coerce').fillna(today_year)


df = df.loc[df.index.repeat(df['year of death'].add(1).sub(df['Year of birth']).astype(int))]
df['Year of birth'] += df.groupby(level=0).cumcount() 

df1 = (df['Year of birth'].value_counts()
                          .sort_index()
                          .rename_axis('year')
                          .reset_index(name='population'))
print (df1)
   year  population
0     1           1
1     2           2
2     3           3
3     4           3
4     5           4
5     6           3
6     7           2
7     8           1

另一种解决方案将列表理解与range重复使用了几年

#if need working with years
#today_year = pd.to_datetime('now').year

today_year = 8
s = pd.to_numeric(df['year of death'], errors='coerce').fillna(today_year)


L = [x for s, e in zip(df['Year of birth'], s) for x in range(s, e + 1)]

df1 = (pd.Series(L).value_counts()
                   .sort_index()
                   .rename_axis('year')
                   .reset_index(name='population'))
print (df1)
   year  population
0     1           1
1     2           2
2     3           3
3     4           3
4     5           4
5     6           3
6     7           2
7     8           1

与以前类似，仅将Counter用作最终DataFrame的字典：

from collections import Counter

#if need working with years
#today_year = pd.to_datetime('now').year

today_year = 8
s = pd.to_numeric(df['year of death'], errors='coerce').fillna(today_year)

d = Counter([x for s, e in zip(df['Year of birth'], s) for x in range(s, e + 1)])
print (d)
Counter({5: 4, 3: 3, 4: 3, 6: 3, 2: 2, 7: 2, 1: 1, 8: 1})

df1 = pd.DataFrame({'year':list(d.keys()),
                    'population':list(d.values())})
print (df1)
   year  population
0     1           1
1     2           2
2     3           3
3     4           3
4     5           4
5     6           3
6     7           2
7     8           1

如何计算大熊猫的人口？

1 个答案: