Python:按平均好友数划分年龄段

时间:2019-01-16 16:17:59

标签: python-3.x data-science data-analysis

我有一个具有4个属性的数据框,可以看成是打击。

enter image description here

我想做的事情要用一个人的名字和年龄,并计算他拥有的朋友数量。那么两个人的年龄相同且名字不同,则以该年龄段的平均朋友数为准。最后将年龄范围划分为年龄组,然后取平均值。这就是我尝试过的方式。

#loc the attribute or features of interest
friends = df.iloc[:,3]
ages = df.iloc[:,2]

# default of dictionary with age as key and value as a list of friends 
dictionary_age_friends = defaultdict(list)

# populating the dictionary with key age and values friend
for i,j in zip(ages,friends):
    dictionary_age_friends[i].append(j)
print("first dict")
print(dictionary_age_friends)

#second dictionary, the same age is collected and the number of friends is added 
set_dict ={}
for x in dictionary_age_friends:
    list_friends =[]
    for y in dictionary_age_friends[x]:
        list_friends.append(y)
    set_list_len = len(list_friends) # assign a friend with a number 1
    set_dict[x] = set_list_len
print(set_dict)

# set_dict ={}
# for x in dictionary_age_friends:
#     print("inside the loop")
#     lis_1 =[]
#     for y in dictionary_age_friends[x]:
#         lis_1.append(y)
#         set_list = lis_1
#         set_list = [1 for x in set_list] # assign a friend with a number 1
#         set_dict[x] = sum(set_list)

# a dictionary that assign the age range into age-groups
second_dict = defaultdict(list)
for i,j in set_dict.items(): 
    if i in range(16,20):           
        i = 'teens_youthAdult'
        second_dict[i].append(j)
    elif i in range(20,40):       
        i ="Adult"
        second_dict[i].append(j)
    elif i in  range(40,60):        
        i ="MiddleAge"
        second_dict[i].append(j)
    elif i in range(60,72):       
        i = "old"
        second_dict[i].append(j)
print(second_dict)
print("final dict stared")
new_dic ={}

for key,value in second_dict.items():
    if key == 'teens_youthAdult':
        new_dic[key] = round((sum(value)/len(value)),2)
    elif key =='Adult':
        new_dic[key] = round((sum(value)/len(value)),2)
    elif key =='MiddleAge' :
        new_dic[key] = round((sum(value)/len(value)),2)
    else:
        new_dic[key] = round((sum(value)/len(value)),2)
new_dic
end_time = datetime.datetime.now()


print(end_time-start_time)


print(new_dic)

我得到的一些反馈是:1,如果您只想计算朋友数,则无需建立列表。 2,两个年龄相同的个人,年龄18。一个有4个朋友,另一个3.当前代码得出的结论是平均有7个朋友。 3,代码不正确,不正确。

有什么建议或帮助吗?多谢所有建议或帮助?

1 个答案:

答案 0 :(得分:0)

我不了解属性名称,也没有提及需要按哪个年龄段划分数据。在我的答案中,我将把数据视为属性是:

index, name, age, friend

要按名称查找数量,建议您使用groupby

输入:

groups = df.groupby([df.iloc[:,0],df.iloc[:,1]]) # grouping by name(0), age(1)
amount_of_friends_df = groups.size() # gathering amount of friends for a person
print(amount_of_friends_df)

输出:

name  age
EUNK  25     1
FBFM  26     1
MYYD  30     1
OBBF  28     2
RJCW  25     1
RQTI  21     1
VLIP  16     1
ZCWQ  18     1
ZMQE  27     1

要按年龄查找朋友数量,您还可以使用组

输入:

groups = df.groupby([df.iloc[:,1]]) # groups by age(1)
age_friends = groups.size() 
age_friends=age_friends.reset_index()
age_friends.columns=(['age','amount_of_friends'])
print(age_friends)

输出:

    age  amount_of_friends
0   16                  1
1   18                  1
2   21                  1
3   25                  2
4   26                  1
5   27                  1
6   28                  2
7   30                  1

要计算每个年龄段的平均朋友数量,您可以使用categories和groupby。

输入:

mean_by_age_group_df = age_friends.groupby(pd.cut(age_friends.age,[20,40,60,72]))\
.agg({'amount_of_friends':'mean'})
print(mean_by_age_group_df)

pd.cut返回我们用来分组数据的分类序列。然后,我们使用agg函数在数据框中聚合组。

输出:

          amount_of_friends
age                        
(20, 40]           1.333333
(40, 60]                NaN
(60, 72]                NaN