我正在尝试在数据帧中对一些数据进行分组,并通过循环对结果执行一些计算。
采取以下数据框 - “age_wght”
Year Last_Name First_Name Age Weight
1 2000 Smith John 20 145
2 2000 Smith Matt 9 85
3 2005 Smith John 25 160
4 2000 Jones Bob 12 100
5 2000 Jones Mary 18 120
6 2005 Jones Mary 23 130
7 2000 Jones Carrie 9 90
8 2005 Jones Bob 17 210
我试图获得每个人的平均年龄和体重。
我可以通过tapply执行此操作: 目前,我通过以下方式在数据框中创建一个新的键列来计算:
age_wght $ key1 =粘贴(age_wght $ Last_Name,age_wght $ First_Name,sep =“。”)
Year Last_Name First_Name Age Weight key1
1 2000 Smith John 20 145 Smith.John
2 2000 Smith Matt 9 85 Smith.Matt
3 2005 Smith John 25 160 Smith.John
4 2000 Jones Bob 12 100 Jones.Bob
5 2000 Jones Mary 18 120 Jones.Mary
6 2005 Jones Mary 23 130 Jones.Mary
然后使用tapply如下:
avg_age< - with(age_wght,tapply(Age,key1,FUN = mean))
avg_wght< -with(age_wght,tapply(Weight,key1,FUN = mean))
age_wght_summary< - data.frame(avg_age,avg_wght)
age_wght_summary
但我得到的东西是这样的:
avg_age avg_wght
Jones.Bob 14.5 155.0
Jones.Carrie 9.0 90.0
Jones.Mary 20.5 125.0
Smith.John 22.5 152.5
Smith.Matt 9.0 85.0
这是有意义的,因为我将tapply放在key1索引上,但我想要的结果是9有一个带有标题的表:
Last_Name First_Name avg_age avg_wght
我也尝试过使用group_by的dplyr库,但是无法让它工作。
答案 0 :(得分:0)
dplyr
解决方案
library(dplyr)
age_wght %>%
group_by(Last_Name, First_Name) %>%
summarise(avg_age = mean(Age),
avg_wght = mean(Weight))
# Last_Name First_Name avg_age avg_wght
# (fctr) (fctr) (dbl) (dbl)
# 1 Jones Bob 14.5 155.0
# 2 Jones Carrie 9.0 90.0
# 3 Jones Mary 20.5 125.0
# 4 Smith John 22.5 152.5
# 5 Smith Matt 9.0 85.0
data.table
解决方案
library(data.table)
setDT(age_wght)[, .(avg_age = mean(Age), avg_wght = mean(Weight)), by=.(Last_Name, First_Name)]
# Last_Name First_Name avg_age avg_wght
# 1: Smith John 22.5 152.5
# 2: Smith Matt 9.0 85.0
# 3: Jones Bob 14.5 155.0
# 4: Jones Mary 20.5 125.0
# 5: Jones Carrie 9.0 90.0
答案 1 :(得分:0)
base
R解决方案:
nms <- strsplit(rownames(age_wght_summary), split= "\\.")
data.frame(last_name= lapply(nms, "[", 1),
first_name=lapply(nms, "[", 2),
avg_age= age_wht_summary$avg_age,
avg_age= age_wht_summary$avg_wght)