我想将一行的文本与以下所有行进行比较,以便找到偏差。 如何在不使用for循环的情况下将下面的代码转换为代码?
0 0.2
1 0.2
2 0.2
3 0.2
4 0.2
5 0.4
6 0.4
7 0.4
8 0.4
9 0.4
10 0.6
dtype: float64
我想加快这个过程,因为我有大约5000行的大量文本。我想比较第1行和第2行到第19行,依此类推。所以for循环非常慢,有5000行。是否可以使用一些应用功能?
答案 0 :(得分:1)
也许是这样的:
library(stringdist)
dat <- data.frame(n = 1:19, des = c("Some very long text", "Some very lang test", "Some vary long text", "Some veri long text", "Another very long text", "Anather very long text", "Another very long text", "Different text", "Diferent text", "More text", "More test", "Much more text", "Muh more text", "Some other long text", "Some otoher long text", "Some more text", "Same more text", "New text", "New texd"))
column <- which(names(dat) == "des")
N <- nrow(dat)
#change outer loop to sapply
dupli <- c(sapply(1:(N-1), function(row){
#change inner loop to arraywise processing and aggregate with any
any(stringdist(dat[row, column], dat[(row+1):N, column]) < 2)
}), FALSE)
不是那么快,但比普通for循环更快。 cbind(dat, dupli)
不会给予
n des dupli
1 1 Some very long text TRUE
2 2 Some very lang test FALSE
3 3 Some vary long text FALSE
4 4 Some veri long text FALSE
5 5 Another very long text TRUE
6 6 Anather very long text TRUE
7 7 Another very long text FALSE
8 8 Different text TRUE
9 9 Diferent text FALSE
10 10 More text TRUE
11 11 More test FALSE
12 12 Much more text TRUE
13 13 Muh more text FALSE
14 14 Some other long text TRUE
15 15 Some otoher long text FALSE
16 16 Some more text TRUE
17 17 Same more text FALSE
18 18 New text TRUE
19 19 New texd FALSE