我正在处理具有特定标签的数据(例如在社会科学研究中很常见)。特别是,这里列中的值不是作为因子存储,而是作为包含标签(作为属性)的数字列存储。而这些标签又具有属性(标签的名称)。
现在,我知道如何更改现有列的标签名称。但是我没有为新列或更精确地这样做:我知道如何使用专用包创建它们,但我想知道是否有本机/基础 R 选项,例如attr
、attributes
或 structure
。
示例数据:
df <- structure(list(Q16_3 = structure(c(NA, NA, 1, 1, 1, NA, NA, 1, 0, 1),
label = "Q16_3 question label",
format.spss = "F8.2",
labels = c(`Not Selected` = 0, Selected = 1),
class = c("haven_labelled", "vctrs_vctr", "double")),
Q16_4 = structure(c(NA, NA, 1, 1, 1, NA, NA, 0, 0, 1),
label = "Q16_4 question label",
format.spss = "F8.2",
labels = c(`Not Selected` = 0, Selected = 1),
class = c("haven_labelled", "vctrs_vctr", "double"))),
row.names = c(NA, -10L),
class = c("tbl_df", "tbl", "data.frame"))
例如df %>% count(Q16_4)
给出:
# A tibble: 3 x 2
Q16_4 n
* <dbl+lbl> <int>
1 0 [Not Selected] 2
2 1 [Selected] 4
3 NA 4
现在我正在创建一个列并尝试创建一个“标签”属性,但它没有显示:
df <- df %>%
mutate(test = rep(1:2, 5))
df$test <- structure(df$test, labels = c("NO" = 1, "YES" = 2))
df %>%
count(test)
只给:
# A tibble: 2 x 2
test n
* <int> <int>
1 1 5
2 2 5
我想它有什么。与属性本身的结构有关,因为它们看起来不同:
str(df)
tibble [10 x 3] (S3: tbl_df/tbl/data.frame)
$ Q16_3: dbl+lbl [1:10] NA, NA, 1, 1, 1, NA, NA, 1, 0, 1
..@ label : chr "Q16_3 question label"
..@ format.spss: chr "F8.2"
..@ labels : Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "1 Selected" "2 Selected"
$ Q16_4: dbl+lbl [1:10] NA, NA, 1, 1, 1, NA, NA, 0, 0, 1
..@ label : chr "Q16_4 question label"
..@ format.spss: chr "F8.2"
..@ labels : Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "NO NO" "YES YES"
$ test : int [1:10] 1 2 1 2 1 2 1 2 1 2
..- attr(*, "labels")= Named num [1:2] 1 2
.. ..- attr(*, "names")= chr [1:2] "NO" "YES"
长话短说:我需要如何更改代码以允许创建此类“嵌套”属性?
答案 0 :(得分:1)
您可以使用 attr
提取标签并使用 match
替换它们。
var <- attr(df$test, 'labels')
df$test_label <- names(var)[match(df$test, var)]
df
# Q16_3 Q16_4 test test_label
# <dbl+lbl> <dbl+lbl> <int> <chr>
# 1 NA NA 1 NO
# 2 NA NA 2 YES
# 3 1 [Selected] 1 [Selected] 1 NO
# 4 1 [Selected] 1 [Selected] 2 YES
# 5 1 [Selected] 1 [Selected] 1 NO
# 6 NA NA 2 YES
# 7 NA NA 1 NO
# 8 1 [Selected] 0 [Not Selected] 2 YES
# 9 0 [Not Selected] 0 [Not Selected] 1 NO
#10 1 [Selected] 1 [Selected] 2 YES
如果您想替换原来的 test
列,请将其分配给上面的 df$test <-
。
在您的原始数据框中,您拥有的是可以通过这种方式构建的带有标签的数据:
library(dplyr)
df %>%
mutate(test = haven::labelled(rep(1:2, 5), labels = c("NO" = 1, "YES" = 2)))
# Q16_3 Q16_4 test
# <dbl+lbl> <dbl+lbl> <int+lbl>
# 1 NA NA 1 [NO]
# 2 NA NA 2 [YES]
# 3 1 [Selected] 1 [Selected] 1 [NO]
# 4 1 [Selected] 1 [Selected] 2 [YES]
# 5 1 [Selected] 1 [Selected] 1 [NO]
# 6 NA NA 2 [YES]
# 7 NA NA 1 [NO]
# 8 1 [Selected] 0 [Not Selected] 2 [YES]
# 9 0 [Not Selected] 0 [Not Selected] 1 [NO]
#10 1 [Selected] 1 [Selected] 2 [YES]
它的标签也会显示在 count
中:
df %>%
mutate(test = haven::labelled(rep(1:2, 5),labels = c("NO" = 1, "YES" = 2))) %>%
count(test)
# test n
#* <int+lbl> <int>
#1 1 [NO] 5
#2 2 [YES] 5