我有一个用corr <- cor(data, use = "pairwise.complete.obs")
生成的相关矩阵。我使用此代码将数据转换为长格式,并过滤了> 0.1的相关性:
corr %>%
as_tibble(rownames = "From") %>%
gather(key = "To", value = "corr", -From) %>%
filter(!From == To) %>%
mutate(corr_abs = abs(corr)) %>%
filter(corr_abs > 0.1) %>%
arrange(-corr_abs)
但是,结果两次提及每个相关性。当值位于两个不同的列中时,如何删除这些重复项?
# A tibble: 8 x 4
From To corr corr_abs
<chr> <chr> <dbl> <dbl>
1 health.age health.employed -0.393 0.393
2 health.employed health.age -0.393 0.393
3 health.age health.marital 0.212 0.212
4 health.marital health.age 0.212 0.212
5 health.alcohol health.gender 0.187 0.187
6 health.gender health.alcohol 0.187 0.187
7 health.age health.fruitveg 0.100 0.100
8 health.fruitveg health.age 0.100 0.100
# A tibble: 8 x 4
From To corr corr_abs
<chr> <chr> <dbl> <dbl>
1 health.age health.employed -0.393 0.393
2 health.age health.marital 0.212 0.212
3 health.alcohol health.gender 0.187 0.187
4 health.age health.fruitveg 0.100 0.100
corr <- structure(c(1, 0.0632225392922264, 0.0554804788901363, 0.0974838182384356,
0.212473674076218, -0.0286618705621989, 0.0632225392922264, 1,
0.0908529910265203, -0.0554639294179715, -0.0326865391045356,
0.186574369192519, 0.0554804788901363, 0.0908529910265203, 1,
0.0377351030257117, -0.392764651422931, 0.065822234809157, 0.0974838182384356,
-0.0554639294179715, 0.0377351030257117, 1, 0.10048775378073,
-0.0684000695994252, 0.212473674076218, -0.0326865391045356,
-0.392764651422931, 0.10048775378073, 1, -0.0312405196930598,
-0.0286618705621989, 0.186574369192519, 0.065822234809157, -0.0684000695994252,
-0.0312405196930598, 1), .Dim = c(6L, 6L), .Dimnames = list(c("health.marital",
"health.gender", "health.employed", "health.fruitveg", "health.age",
"health.alcohol"), c("health.marital", "health.gender", "health.employed",
"health.fruitveg", "health.age", "health.alcohol")))
答案 0 :(得分:4)
一种选择是将初始数据中的上三角值replace
到NA
,然后用na.rm = TRUE
从gather
中将其删除
corr %>%
replace(., upper.tri(., diag = TRUE), NA) %>%
as_tibble(rownames = "From") %>%
gather(key = "To", value = "corr", -From, na.rm = TRUE) %>%
mutate(corr_abs = abs(corr)) %>%
filter(corr_abs > 0.1) %>%
arrange(-corr_abs)
# A tibble: 4 x 4
# From To corr corr_abs
# <chr> <chr> <dbl> <dbl>
#1 health.age health.employed -0.393 0.393
#2 health.age health.marital 0.212 0.212
#3 health.alcohol health.gender 0.187 0.187
#4 health.age health.fruitveg 0.100 0.100