target_value title people start end twitter_map
0 AGE_13_TO_17 13 to 17 1 13 17 AGE_13_TO_17
1 AGE_13_TO_24 13 to 24 NaN 13 24 NaN
2 AGE_13_TO_34 13 to 34 NaN 13 34 NaN
3 AGE_13_TO_49 13 to 49 NaN 13 49 NaN
4 AGE_13_TO_54 13 to 54 NaN 13 54 NaN
5 AGE_OVER_13 Age Over 13 NaN 13 - NaN
6 AGE_18_TO_24 18 to 24 7 18 24 AGE_18_TO_24
7 AGE_18_TO_54 18 to 54 NaN 18 54 NaN
8 AGE_OVER_18 Age Over 18 NaN 18 - NaN
9 AGE_21_TO_34 21 to 34 NaN 21 34 NaN
10 AGE_21_TO_49 21 to 49 NaN 21 49 NaN
11 AGE_21_TO_54 21 to 54 NaN 21 54 NaN
12 AGE_25_TO_34 25 to 34 34 25 34 AGE_25_TO_34
13 AGE_25_TO_49 25 to 49 NaN 25 49 NaN
14 AGE_OVER_25 Age Over 25 NaN 25 - NaN
15 AGE_35_TO_44 35 to 44 15 35 44 AGE_35_TO_44
16 AGE_OVER_35 Age Over 35 NaN 35 - NaN
17 AGE_45_TO_54 45 to 54 1 45 54 AGE_45_TO_54
18 AGE_OVER_50 Age Over 50 NaN 50 - NaN
19 AGE_55_TO_64 55 to 64 3 55 64 AGE_55_TO_64
20 AGE_OVER_65 65+ 6 65 - AGE_OVER_65
21 None All Ages NaN All Ages - NaN
所以我有如上所示的这个数据帧,并显示年龄开始和年龄结束时的一些值。但是有一些重叠的年龄段。我需要根据人列
中的已知值正确填写人列前两行的预期输出
target_value title people start end twitter_map
0 AGE_13_TO_17 13 to 17 1 13 17 AGE_13_TO_17
1 AGE_13_TO_24 13 to 24 8 13 24 NaN
答案 0 :(得分:2)
我将研究一个简化的例子:
people start end
1 13 17
NaN 13 24
NaN 13 34
NaN 13 -
7 18 24
NaN 18 -
34 25 34
首先用无穷大替换-
并将all转换为float:
import numpy as np
df = df.replace({'-': np.inf}).astype(float)
然后选择给出'people'数量的行,这将是输入:
df_input = df.dropna()
现在定义以下功能:
def func(row):
return df_input.loc[
(df_input['start'] >= row['start']) & (df_input['end'] <= row['end']),
'people'
].sum()
对于数据框中的每一行,它将输入中满足定义年龄段的条件的所有数字相加(这是无穷大有用的地方)。
最后应用函数:
In [36]: df.apply(func, axis=1)
Out[36]:
0 1.0
1 8.0
2 42.0
3 42.0
4 7.0
5 41.0
6 34.0