我有两个数据帧,dfA和dfB。
DFA:
ID <- c('ID1','ID2','ID3','ID4')
lowval <- c(12,13,20,40)
upval <- c(14,15,22,42)
cr <- c("item1","item2","item3","item4")
dfA <- data.frame(ID,lowval,upval,cr)
>dfA
ID lowval upval cr
1 ID1 12 14 item1
2 ID2 13 15 item2
3 ID3 20 22 item3
4 ID4 40 42 item4
DFB:
match <- c('30','30','30','30')
pos <- c(3,13,18,41)
desc <- c("heavy","light","blue","black")
dfB <- data.frame(match,pos,desc)
>dfB
match pos desc
1 30 3 heavy
2 30 13 light
3 30 18 blue
4 30 41 black
我想遍历每一行,询问dfB $ pos是否位于dfA $ lowval和dfB $ upval之间,如果是,则将整行从dfA和dfB打印到输出文件中。
在这种情况下,所需的输出文件如下所示:
ID lowval upval cr match pos desc
ID1 12 14 item1 30 13 light
ID4 40 42 item4 30 41 black
我尝试过创建一个函数:
f <- function(x, y, output) {
lowervalue = x[2]
uppervalue = x[3]
position = y[2]
if(position>=lowervalue & position<=uppervalue){
print(paste(x,y,sep="\t"))
cat(paste(x,y, sep="\t"), file= output, append = T, fill = T)
}
}
apply(dfA, dfB, f, output = 'outputfile.txt')
但是我收到了以下错误:
Error in ds[-MARGIN] : invalid subscript type 'list'
In addition: Warning messages:
1: In Ops.factor(left) : ‘-’ not meaningful for factors
2: In Ops.factor(left) : ‘-’ not meaningful for factors
有人可以建议创建此输出文件的解决方案吗?我很困难。
答案 0 :(得分:1)
你可以尝试:
merge(dfA, dfB)[c(sapply(dfB$pos, function(x) apply(dfA[2:3], 1, function(y)
y[1] <= x & y[2] >= x))),]
ID lowval upval cr match pos desc
5 ID1 12 14 item1 30 13 light
6 ID2 13 15 item2 30 13 light
16 ID4 40 42 item4 30 41 black
答案 1 :(得分:1)
outer()
f <- 'output.txt';
write(capture.output(print(with(as.data.frame(which(outer(dfB$pos,dfA$lowval,`>=`) & outer(dfB$pos,dfA$upval,`<=`),arr.ind=T)),cbind(dfA[col,],dfB[row,])),row.names=F)),f);
cat(readLines(f),sep='\n');
## ID lowval upval cr match pos desc
## ID1 12 14 item1 30 13 light
## ID2 13 15 item2 30 13 light
## ID4 40 42 item4 30 41 black
在您的问题中,您的预期输出中没有ID2
,但基于包容性比较(例如>=
与>
)13在13到15之间,所以它有资格作为比赛。
lapply()
f <- 'output.txt';
write(capture.output(print(do.call(rbind,lapply(seq_len(nrow(dfA)),function(ai) { res <- dfB$pos>=dfA$lowval[ai] & dfB$pos<=dfA$upval[ai]; if (any(res)) cbind(dfA[ai,],dfB[res,]); })),row.names=F)),f);
cat(readLines(f),sep='\n');
## ID lowval upval cr match pos desc
## ID1 12 14 item1 30 13 light
## ID2 13 15 item2 30 13 light
## ID4 40 42 item4 30 41 black