我的数据框中包含文本列name
和因子city
。它首先按字母顺序排列city
然后name
。现在我需要得到一个数据框,其中只包含每个city
中的第n个元素,保持这种顺序。如何在没有循环的情况下以漂亮的方式完成它?
我有:
name city
John Atlanta
Josh Atlanta
Matt Atlanta
Bob Boston
Kate Boston
Lily Boston
Matt Boston
我想要一个函数,它返回city
的第n个元素,即如果它是第3个,那么:
name city
Matt Atlanta
Lily Boston
如果NULL
超出所选name
的范围,则会返回city
,{4}:
name city
NULL Atlanta
Matt Boston
请仅使用基地R?
答案 0 :(得分:5)
在使用by
的基础R中:
设置一些测试数据,包括额外的超出范围值:
test <- read.table(text="name city
John Atlanta
Josh Atlanta
Matt Atlanta
Bob Boston
Kate Boston
Lily Boston
Matt Boston
Bob Seattle
Kate Seattle",header=TRUE)
获取每个城市的第3项:
do.call(rbind,by(test,test$city,function(x) x[3,]))
结果:
name city
Atlanta Matt Atlanta
Boston Lily Boston
Seattle <NA> <NA>
为了得到你想要的东西,这里有一个小功能:
nthrow <- function(dset,splitvar,n) {
result <- do.call(rbind,by(dset,dset[splitvar],function(x) x[n,]))
result[,splitvar][is.na(result[,splitvar])] <- row.names(result)[is.na(result[,splitvar])]
row.names(result) <- NULL
return(result)
}
称之为:
nthrow(test,"city",3)
结果:
name city
1 Matt Atlanta
2 Lily Boston
3 <NA> Seattle
答案 1 :(得分:2)
您可以使用plyr
:
dat <- structure(list(name = c("John", "Josh", "Matt", "Bob", "Kate",
“Lily”,“Matt”),city = c(“亚特兰大”,“亚特兰大”,“亚特兰大”,“波士顿”, “Boston”,“Boston”,“Boston”),.。Name = c(“name”,“city”),class =“data.frame”,row.names = c(NA, -7L))
library(plyr)
ddply(dat, .(city), function(x, n) x[n,], n=3)
> ddply(dat, .(city), function(x, n) x[n,], n=3)
name city
1 Matt Atlanta
2 Lily Boston
> ddply(dat, .(city), function(x, n) x[n,], n=4)
name city
1 <NA> <NA>
2 Matt Boston
>
使用基础R或data.table
或sqldf
还有很多其他选项......
答案 2 :(得分:2)
data.table
解决方案
library(data.table)
DT <- data.table(test)
# return all columns from the subset data.table
n <- 4
DT[,.SD[n,] ,by = city]
## city name
## 1: Atlanta NA
## 2: Boston Matt
## 3: Seattle NA
# if you just want the nth element of `name`
# (excluding other columns that might be there)
# any of the following would work
DT[,.SD[n,] ,by = city, .SDcols = 'name']
DT[, .SD[n, list(name)], by = city]
DT[, list(name = name[n]), by = city]