按因子选择数据框中的第n个元素

时间:2012-10-11 23:58:55

标签: r

我的数据框中包含文本列name和因子city。它首先按字母顺序排列city然后name。现在我需要得到一个数据框,其中只包含每个city中的第n个元素,保持这种顺序。如何在没有循环的情况下以漂亮的方式完成它?

我有:

name    city
John    Atlanta
Josh    Atlanta
Matt    Atlanta
Bob     Boston
Kate    Boston
Lily    Boston
Matt    Boston

我想要一个函数,它返回city的第n个元素,即如果它是第3个,那么:

name    city
Matt    Atlanta
Lily    Boston

如果NULL超出所选name的范围,则会返回city,{4}:

name    city
NULL    Atlanta
Matt    Boston

请仅使用基地R?

3 个答案:

答案 0 :(得分:5)

在使用by的基础R中:

设置一些测试数据,包括额外的超出范围值:

test <- read.table(text="name    city
John    Atlanta
Josh    Atlanta
Matt    Atlanta
Bob     Boston
Kate    Boston
Lily    Boston
Matt    Boston
Bob     Seattle
Kate    Seattle",header=TRUE)

获取每个城市的第3项:

do.call(rbind,by(test,test$city,function(x) x[3,]))

结果:

        name    city
Atlanta Matt Atlanta
Boston  Lily  Boston
Seattle <NA>    <NA>

为了得到你想要的东西,这里有一个小功能:

nthrow <- function(dset,splitvar,n) {
    result <- do.call(rbind,by(dset,dset[splitvar],function(x) x[n,]))
    result[,splitvar][is.na(result[,splitvar])] <- row.names(result)[is.na(result[,splitvar])]
    row.names(result) <- NULL
    return(result)
}

称之为:

nthrow(test,"city",3)

结果:

  name    city
1 Matt Atlanta
2 Lily  Boston
3 <NA> Seattle

答案 1 :(得分:2)

您可以使用plyr

dat <- structure(list(name = c("John", "Josh", "Matt", "Bob", "Kate", 

“Lily”,“Matt”),city = c(“亚特兰大”,“亚特兰大”,“亚特兰大”,“波士顿”, “Boston”,“Boston”,“Boston”),.。Name = c(“name”,“city”),class =“data.frame”,row.names = c(NA, -7L))

library(plyr)

ddply(dat, .(city), function(x, n) x[n,], n=3)

> ddply(dat, .(city), function(x, n) x[n,], n=3)
  name    city
1 Matt Atlanta
2 Lily  Boston
> ddply(dat, .(city), function(x, n) x[n,], n=4)
  name   city
1 <NA>   <NA>
2 Matt Boston
> 

使用基础R或data.tablesqldf还有很多其他选项......

答案 2 :(得分:2)

data.table解决方案

library(data.table)
DT <- data.table(test)

# return all columns from the subset data.table
n <- 4
DT[,.SD[n,] ,by = city]
##      city name
## 1: Atlanta   NA
## 2:  Boston Matt
## 3: Seattle   NA

# if you just want the nth element of `name` 
# (excluding other columns that might be there)
# any of the following would work

DT[,.SD[n,] ,by = city, .SDcols = 'name']


DT[, .SD[n, list(name)], by = city]


DT[, list(name = name[n]), by = city]