我正在尝试删除数据框中所有具有绝对重复项的行。这是一个例子。
#standardSQL
SELECT sensor, `date`, AVG(perc) avg_percentile20_80_day
FROM (
SELECT sensor, `date`, perc,
PERCENTILE_CONT(perc, 0.2) OVER(PARTITION BY sensor, `date`) percentile20_day,
PERCENTILE_CONT(perc, 0.8) OVER(PARTITION BY sensor, `date`) percentile80_day
FROM `project.dataset.sensordata`
)
WHERE perc BETWEEN percentile20_day AND percentile80_day
GROUP BY sensor, `date`
运行上面的命令时,基本上将有一个包含“ vector”和“ -vector”元素的所有唯一组合的数据框。例如,您将看到以下内容:
library(gtools)
vector <- c(15.3, -31.8, -35.6, -14.5, 3.1,-24.5)
vector.combo <- data.frame(combinations(n = 12, r = 6, v = c(vector,-vector)))
现在,我要删除所有包含绝对重复项的行,这意味着我要删除包含诸如“ 35.6”和“ -35.6”之类的元素的行。
我尝试:
-35.6 -31.8 -15.3 -3.1 3.1 35.6
但是没有用。
任何提示将不胜感激。
谢谢!
答案 0 :(得分:1)
我认为您需要以下内容:
library(gtools)
vector <- c(15.3, -31.8, -35.6, -14.5, 3.1,-24.5)
vector.combo <- data.frame(combinations(n = 12, r = 6, v = c(vector,-vector)))
unique_combo <- vector.combo[apply(abs(vector.combo), 1, function(x) length(unique(x))) ==6,]
索引在表的绝对值上逐行移动,并计算有多少个唯一元素。如果唯一元素的数量为6,则返回true,否则返回false。然后,我们将其用作vector.combo
的索引。
答案 1 :(得分:1)
使用dplyr解决方案:
library(gtools)
library(dplyr)
vector <- c(15.3, -31.8, -35.6, -14.5, 3.1,-24.5)
vector.combo <- data.frame(combinations(n = 12, r = 6, v = c(vector,-vector)))
dup_idx <-
vector.combo %>%
transmute_all(abs) %>%
duplicated()
vector.combo[!dup_idx,]
问候 Paweł
答案 2 :(得分:0)
get_rid <- c()
for ( i in 1 : length(vector.combo[,1]) ) {
if ( length(unique(abs(vector.combo[i,]))) != 6 ) {
get_rid <- c(get_rid, i)
}
}
vector.combo <- vector.combo[-get_rid,]
应该这样做。