Question

所以我有两个文件（空格分隔），file1的行名是从file2中随机取的，但是，file1有多列，而file2只有行名列。我想重新生成file2，文件2的其余列将从file1中取出，根据减法的最小绝对值。

例如：

File1中：

5 0.1 0.2 0.5
20 0.3 0.3 0.6
30 0.5 0.66 0.1
100 0.9 0 1

file1,5,20,30和100的第一列都来自file2。

文件2：

此文件只有行名。此文件中包含文件1中的5,20 30和100。

期望的输出：

2 0.1 0.2 0.5
5 0.1 0.2 0.5
19 0.3 0.3 0.6
20 0.3 0.3 0.6
27 0.5 0.66 0.1
30 0.5 0.66 0.1
65 0.5 0.66 0.1
100 0.9 0 1
105 0.9 0 1

两个文件按列1从最小到最大编号排序。基本上我希望file2的每个数字从具有最小绝对值file1，column1的行中获取其余列。例如，file2的第一个数字是2，而2的绝对值最小为5，因此该行包含来自file1的其余列，＆＃34; 5＆＃34;的行。如果存在平局，这意味着两行在减去时具有相同的绝对值，则输出将从较小的数字行获得列。例如，来自file2的65，它在文件1中具有相等的30和100的绝对值，因此它将从30行获得较小的值。

我试图在R中执行此操作，这是我的代码：

i<-1
b<- data.frame(stringsAsFactors=FALSE)
N<- 4 ## number of lines in file1
Row <- 9 ## number of lines in file2
while (i<=Row) {
test <- which(abs(file1[,1] - rep(file2[i,1],N)) == min(abs(file1[,1] - rep(file2[i,1], N)))); ## repeating the value of file2 N times and find the smallest with file1 by subtraction
    if (length(test) == 1) {  ## find 1 smallest value
        a<- file1[test,]; b<-rbind(b, a)
    }
    else if (length(test) == 2) {  ## tie, get the first value, the"smaller one"
        a<- file1[(head(test,1)),]; b<-rbind(b, a)
    } 
    else {warning("There is an error; test is neither 1 or 2")}; i<-i+1
}

output <- b
output$V1 <- NULL

当我的file1和file2变得非常大时，它工作但非常慢。请问有更快的方法吗？欢迎所有方法，awk，shell，R，Perl，python ..谢谢！

Answer 1

根据您的代码，我认为您实际上并不是指row.names，而只是数据框中的第一列。有几种方法可以做到这一点，但这里有一种方法：

index <- unlist(lapply(File2[,1], function(x) 
  min(which(abs(x - File1[,1]) == min(abs(x - File1[,1]))))))
File2.new <- File1[index,]
File2.new
#      V1  V2   V3  V4
# 1     5 0.1 0.20 0.5
# 1.1   5 0.1 0.20 0.5
# 2    20 0.3 0.30 0.6
# 2.1  20 0.3 0.30 0.6
# 3    30 0.5 0.66 0.1
# 3.1  30 0.5 0.66 0.1
# 3.2  30 0.5 0.66 0.1
# 4   100 0.9 0.00 1.0
# 4.1 100 0.9 0.00 1.0

Answer 2

没有行名读取它会使这更容易。这是一种使用辅助函数的方法：

nearest <- function(x, y){
    o <- outer(x,y,function(x,y)abs(x-y))
    a <- apply(o, 1, which.min)
    y[a]
}

阅读数据：

file1 <- read.table(header=FALSE,text="
5 0.1 0.2 0.5
20 0.3 0.3 0.6
30 0.5 0.66 0.1
100 0.9 0 1
")

file2 <- read.table(header=FALSE,text="
2
5
19
20
27
30
65
100
105
")

结果：

merge(within(file2, {V1_old <- V1; V1 <- nearest(V1, file1$V1)}), file1, all.x=TRUE)

   V1 V1_old  V2   V3  V4
1   5      2 0.1 0.20 0.5
2   5      5 0.1 0.20 0.5
3  20     19 0.3 0.30 0.6
4  20     20 0.3 0.30 0.6
5  30     27 0.5 0.66 0.1
6  30     30 0.5 0.66 0.1
7  30     65 0.5 0.66 0.1
8 100    100 0.9 0.00 1.0
9 100    105 0.9 0.00 1.0

Answer 3

V1_2 <- unlist(lapply(file2$V1, function(x) file1$V1[which.min(abs(x - file1$V1))]))
file2 <- cbind.data.frame(file2, V1_2)
merge(file2, file1, by.x = "V1_2", by.y = "V1", all.x = TRUE)

减去数字行名称以查找最小绝对值，并根据最小绝对值定义行的其余部分

3 个答案: