循环遍历R中数据框中的多个列

时间:2016-03-31 22:03:54

标签: r

我正在尝试在数据帧中对一些数据进行分组,并通过循环对结果执行一些计算。

采取以下数据框 - “age_wght”

  Year Last_Name First_Name Age Weight
1 2000     Smith       John  20    145
2 2000     Smith       Matt   9     85
3 2005     Smith       John  25    160
4 2000     Jones        Bob  12    100
5 2000     Jones       Mary  18    120
6 2005     Jones       Mary  23    130
7 2000     Jones     Carrie   9     90
8 2005     Jones        Bob  17    210

我试图获得每个人的平均年龄和体重。

我可以通过tapply执行此操作: 目前,我通过以下方式在数据框中创建一个新的键列来计算:

age_wght $ key1 =粘贴(age_wght $ Last_Name,age_wght $ First_Name,sep =“。”)

  Year Last_Name First_Name Age Weight       key1
1 2000     Smith       John  20    145 Smith.John
2 2000     Smith       Matt   9     85 Smith.Matt
3 2005     Smith       John  25    160 Smith.John
4 2000     Jones        Bob  12    100  Jones.Bob
5 2000     Jones       Mary  18    120 Jones.Mary
6 2005     Jones       Mary  23    130 Jones.Mary

然后使用tapply如下:

avg_age< - with(age_wght,tapply(Age,key1,FUN = mean))

avg_wght< -with(age_wght,tapply(Weight,key1,FUN = mean))

age_wght_summary< - data.frame(avg_age,avg_wght)

age_wght_summary

但我得到的东西是这样的:

             avg_age avg_wght
Jones.Bob       14.5    155.0
Jones.Carrie     9.0     90.0
Jones.Mary      20.5    125.0
Smith.John      22.5    152.5
Smith.Matt       9.0     85.0

这是有意义的,因为我将tapply放在key1索引上,但我想要的结果是9有一个带有标题的表: Last_Name First_Name avg_age avg_wght

我也尝试过使用group_by的dplyr库,但是无法让它工作。

2 个答案:

答案 0 :(得分:0)

dplyr解决方案

library(dplyr)

age_wght %>%
    group_by(Last_Name, First_Name) %>%
    summarise(avg_age = mean(Age),
                        avg_wght = mean(Weight))

#   Last_Name First_Name avg_age avg_wght
#     (fctr)     (fctr)   (dbl)    (dbl)
# 1     Jones        Bob    14.5    155.0
# 2     Jones     Carrie     9.0     90.0
# 3     Jones       Mary    20.5    125.0
# 4     Smith       John    22.5    152.5
# 5     Smith       Matt     9.0     85.0

data.table解决方案

library(data.table)
setDT(age_wght)[, .(avg_age = mean(Age), avg_wght = mean(Weight)), by=.(Last_Name, First_Name)]

#    Last_Name First_Name avg_age avg_wght
# 1:     Smith       John    22.5    152.5
# 2:     Smith       Matt     9.0     85.0
# 3:     Jones        Bob    14.5    155.0
# 4:     Jones       Mary    20.5    125.0
# 5:     Jones     Carrie     9.0     90.0

答案 1 :(得分:0)

base R解决方案:

nms <- strsplit(rownames(age_wght_summary), split= "\\.")
data.frame(last_name= lapply(nms, "[", 1),
           first_name=lapply(nms, "[", 2),
           avg_age= age_wht_summary$avg_age,
           avg_age= age_wht_summary$avg_wght)