Question

我有一个具有4个属性的数据框，可以看成是打击。

我想做的事情要用一个人的名字和年龄，并计算他拥有的朋友数量。那么两个人的年龄相同且名字不同，则以该年龄段的平均朋友数为准。最后将年龄范围划分为年龄组，然后取平均值。这就是我尝试过的方式。

#loc the attribute or features of interest
friends = df.iloc[:,3]
ages = df.iloc[:,2]

# default of dictionary with age as key and value as a list of friends 
dictionary_age_friends = defaultdict(list)

# populating the dictionary with key age and values friend
for i,j in zip(ages,friends):
    dictionary_age_friends[i].append(j)
print("first dict")
print(dictionary_age_friends)

#second dictionary, the same age is collected and the number of friends is added 
set_dict ={}
for x in dictionary_age_friends:
    list_friends =[]
    for y in dictionary_age_friends[x]:
        list_friends.append(y)
    set_list_len = len(list_friends) # assign a friend with a number 1
    set_dict[x] = set_list_len
print(set_dict)

# set_dict ={}
# for x in dictionary_age_friends:
#     print("inside the loop")
#     lis_1 =[]
#     for y in dictionary_age_friends[x]:
#         lis_1.append(y)
#         set_list = lis_1
#         set_list = [1 for x in set_list] # assign a friend with a number 1
#         set_dict[x] = sum(set_list)

# a dictionary that assign the age range into age-groups
second_dict = defaultdict(list)
for i,j in set_dict.items(): 
    if i in range(16,20):           
        i = 'teens_youthAdult'
        second_dict[i].append(j)
    elif i in range(20,40):       
        i ="Adult"
        second_dict[i].append(j)
    elif i in  range(40,60):        
        i ="MiddleAge"
        second_dict[i].append(j)
    elif i in range(60,72):       
        i = "old"
        second_dict[i].append(j)
print(second_dict)
print("final dict stared")
new_dic ={}

for key,value in second_dict.items():
    if key == 'teens_youthAdult':
        new_dic[key] = round((sum(value)/len(value)),2)
    elif key =='Adult':
        new_dic[key] = round((sum(value)/len(value)),2)
    elif key =='MiddleAge' :
        new_dic[key] = round((sum(value)/len(value)),2)
    else:
        new_dic[key] = round((sum(value)/len(value)),2)
new_dic
end_time = datetime.datetime.now()


print(end_time-start_time)


print(new_dic)

我得到的一些反馈是：1，如果您只想计算朋友数，则无需建立列表。 2，两个年龄相同的个人，年龄18。一个有4个朋友，另一个3.当前代码得出的结论是平均有7个朋友。 3，代码不正确，不正确。

有什么建议或帮助吗？多谢所有建议或帮助？

Answer 1

我不了解属性名称，也没有提及需要按哪个年龄段划分数据。在我的答案中，我将把数据视为属性是：

index, name, age, friend

要按名称查找数量，建议您使用groupby。

输入：

groups = df.groupby([df.iloc[:,0],df.iloc[:,1]]) # grouping by name(0), age(1)
amount_of_friends_df = groups.size() # gathering amount of friends for a person
print(amount_of_friends_df)

输出：

name  age
EUNK  25     1
FBFM  26     1
MYYD  30     1
OBBF  28     2
RJCW  25     1
RQTI  21     1
VLIP  16     1
ZCWQ  18     1
ZMQE  27     1

要按年龄查找朋友数量，您还可以使用组

输入：

groups = df.groupby([df.iloc[:,1]]) # groups by age(1)
age_friends = groups.size() 
age_friends=age_friends.reset_index()
age_friends.columns=(['age','amount_of_friends'])
print(age_friends)

输出：

    age  amount_of_friends
0   16                  1
1   18                  1
2   21                  1
3   25                  2
4   26                  1
5   27                  1
6   28                  2
7   30                  1

要计算每个年龄段的平均朋友数量，您可以使用categories和groupby。

输入：

mean_by_age_group_df = age_friends.groupby(pd.cut(age_friends.age,[20,40,60,72]))\
.agg({'amount_of_friends':'mean'})
print(mean_by_age_group_df)

pd.cut返回我们用来分组数据的分类序列。然后，我们使用agg函数在数据框中聚合组。

输出：

          amount_of_friends
age                        
(20, 40]           1.333333
(40, 60]                NaN
(60, 72]                NaN

Python：按平均好友数划分年龄段

1 个答案: