我有一个数据集df:
df=data.frame(rbind(c("A",1,1,"abc"),
c("B",0,0,"def"),
c("C",0,1,"hep"),
c("A",1,1,"hit"),
c("B",0,1,"occ"),
c("C",1,1,"tem"),
c("A",1,1,"twi"),
c("B",1,1,"twa"),
c("C",1,1,"mit"),
c("A",1,1,"mot"),
c("C",1,1,"mot"),
c("B",1,1,"mjak")))
names(df)=c("id","v1","v2","check")
我想在DF中创建一个id子集,其中包含“check”列中“ch.vars”向量中包含的值。
ch.vars=c("abc","hit","mot","twi","mjak")
如果id包含除“ch.vars”中给出的值之外的任何值,则它们将从数据集中排除。例如,ID和C在检查列中包含其他值,因此它们将被排除在子集。
这是我到目前为止所尝试的内容:
df$check.var=ifelse(df$check %in% ch.vars,1,0)
df=arrange(df,id)
st1=filter(df,check.var==0)
st1=as.character(unique(st1$id))
df2=df[!df$id %in% st1,]
> df2
id v1 v2 check check.var
1 A 1 1 abc 1
2 A 1 1 hit 1
3 A 1 1 twi 1
4 A 1 1 mot 1
这有效,但我想知道是否有更有效的方法来做到这一点,即以更少的步骤实现结果。谢谢!
答案 0 :(得分:3)
您可以在dplyr包中使用int stepY, stepX, yMin, yMax, yOpposite, yStart, xMin, xMax, xOpposite, xStart;
if (yOpposite > yStart) {
stepY = 1;
yMin = yStart;
yMax = yOpposite;
}
else {
stepY = -1;
yMax = yStart;
yMin = yOpposite;
}
if (xOpposite > xStart) {
stepX = -1;
xMin = xStart;
xMax = xOpposite;
}
else {
stepX = 1;
xMin = xOpposite;
xMax = xStart;
}
// boolean followAlongX = false;
// if (xMax-xMin>yMax-yMin) {
// loopOnX = true;
// }
List<Points> path = new ArrayList<>();
if (followAlongX) {
for (int i=yMin; i!=yMax; i+=stepY) {
for (int j=xmin; j!=xmax; j+=stepX) {
path.add(new Point(i,j));
}
stepX = -stepX;
int temp = xMin;
xMin = xMax;
xMax = temp;
}
}
else {
for (int j=xmin; j!=xmax; j+=stepX) {
for (int i=yMin; i!=yMax; i+=stepY) {
path.add(new Point(i,j));
}
stepY = -stepY;
int temp = yMin;
yMin = yMax;
yMax = temp;
}
}
return path.toArray(new Point[path.size()]);
和group_by
执行此操作:
filter
答案 1 :(得分:3)
一个data.table
解决方案:
library(data.table)
data.table(df)[,.SD[all(check%in%ch.vars)],by="id"]
# id v1 v2 check
#1: A 1 1 abc
#2: A 1 1 hit
#3: A 1 1 twi
#4: A 1 1 mot
您还可以setkey
使用id
来加快速度。