Question

我想将一行的文本与以下所有行进行比较，以便找到偏差。如何在不使用for循环的情况下将下面的代码转换为代码？

0     0.2
1     0.2
2     0.2
3     0.2
4     0.2
5     0.4
6     0.4
7     0.4
8     0.4
9     0.4
10    0.6
dtype: float64

我想加快这个过程，因为我有大约5000行的大量文本。我想比较第1行和第2行到第19行，依此类推。所以for循环非常慢，有5000行。是否可以使用一些应用功能？

Answer 1

也许是这样的：

library(stringdist)
dat <- data.frame(n = 1:19, des = c("Some very long text", "Some very lang test", "Some vary long text", "Some veri long text", "Another very long text", "Anather very long text", "Another very long text", "Different text", "Diferent text", "More text", "More test", "Much more text", "Muh more text", "Some other long text", "Some otoher long text", "Some more text", "Same more text", "New text", "New texd"))

column <- which(names(dat) == "des")
N <- nrow(dat)

#change outer loop to sapply
dupli <- c(sapply(1:(N-1), function(row){
    #change inner loop to arraywise processing and aggregate with any
    any(stringdist(dat[row, column], dat[(row+1):N, column]) < 2)
}), FALSE)

不是那么快，但比普通for循环更快。 cbind(dat, dupli)不会给予

    n                    des dupli
1   1    Some very long text  TRUE
2   2    Some very lang test FALSE
3   3    Some vary long text FALSE
4   4    Some veri long text FALSE
5   5 Another very long text  TRUE
6   6 Anather very long text  TRUE
7   7 Another very long text FALSE
8   8         Different text  TRUE
9   9          Diferent text FALSE
10 10              More text  TRUE
11 11              More test FALSE
12 12         Much more text  TRUE
13 13          Muh more text FALSE
14 14   Some other long text  TRUE
15 15  Some otoher long text FALSE
16 16         Some more text  TRUE
17 17         Same more text FALSE
18 18               New text  TRUE
19 19               New texd FALSE

使用apply而不是嵌套for循环使用if语句

1 个答案: