我正在尝试使用pivot wider
创建包含值的多个列/变量,但是我不应该在列中使用NA。
以下是数据的代表性示例:
df <- structure(list(Condition = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Control", "Retraction1",
"Retraction2"), class = "factor"), First = structure(c(2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Journalist",
"Police", "Reviewer", "Spokesperson"), class = "factor"), Second = structure(c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Journalist",
"Police", "Reviewer", "Spokesperson"), class = "factor"), Third = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Journalist",
"Police", "Reviewer", "Spokesperson"), class = "factor"), Fourth = structure(c(4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Journalist",
"Police", "Reviewer", "Spokesperson"), class = "factor"), ID = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35",
"36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46",
"47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57",
"58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68",
"69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79",
"80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "90",
"91", "92", "93", "94", "95", "96", "97", "98", "99", "100",
"101"), class = "factor"), Scenario = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 1L, 2L, 3L, 4L), .Label = c("J", "P", "R",
"S"), class = "factor"), Estimate = structure(c(4L, 8L, 7L, 11L,
9L, 12L, 10L, 2L, 5L, 6L, 4L, 7L, 11L, 9L, 12L, 10L, 2L, 3L,
5L, 6L, 4L, 8L, 7L, 11L, 9L, 12L, 10L, 2L, 5L, 6L, 4L, 8L, 7L,
11L, 9L, 12L, 10L, 2L, 5L, 6L, 1L, 1L, 1L, 1L), .Label = c("CompMean",
"P.H.Reps.", "P.H.Reps..1", "P.Rel.", "P.Rel1.Reps.", "P.Rel2.Reps.",
"P.Rep1.nH.nRel.", "P.Rep1.nH.Rel.", "P.Rep2.nH.nRel.nRep1.",
"P.Rep2.nH.nRel.Rep1.", "P.Rep2.nH.Rel.nRep1.", "P.Rep2.nH.Rel.Rep1."
), class = "factor"), value = c(90L, 8L, 82L, 11L, 82L, 11L,
82L, 100L, 99L, NA, 62L, 11L, 91L, 12L, 91L, 5L, 82L, 91L, 80L,
NA, 92L, 12L, 61L, 18L, 90L, 21L, 81L, 96L, 92L, NA, 91L, 10L,
72L, 22L, 62L, 21L, 73L, 99L, 98L, NA, 7L, 7L, 7L, 7L)), row.names = c(NA,
-44L), class = c("tbl_df", "tbl", "data.frame"))
head(df)
这是来自一个主题的数据。 P.Rel2.Reps.
中应该只有 个NA,而没有其他
但是,当我像这样使用pivotwide时,其他一些列中也有NA:
pivot_wider(df, names_from = Estimate, values_from = value)
以下是数据在旋转更宽后的外观示例。
df2 <- structure(list(Condition = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = c("Control", "Retraction1", "Retraction2"
), class = "factor"), First = structure(c(2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("Journalist", "Police", "Reviewer",
"Spokesperson"), class = "factor"), Second = structure(c(3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Journalist",
"Police", "Reviewer", "Spokesperson"), class = "factor"), Third = structure(c(1L,
1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Journalist",
"Police", "Reviewer", "Spokesperson"), class = "factor"), Fourth = structure(c(4L,
4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Journalist",
"Police", "Reviewer", "Spokesperson"), class = "factor"), ID = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), .Label = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26",
"27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37",
"38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48",
"49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59",
"60", "61", "62", "63", "64", "65", "66", "67", "68", "69", "70",
"71", "72", "73", "74", "75", "76", "77", "78", "79", "80", "81",
"82", "83", "84", "85", "86", "87", "88", "89", "90", "91", "92",
"93", "94", "95", "96", "97", "98", "99", "100", "101"), class = "factor"),
Scenario = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L,
2L), .Label = c("J", "P", "R", "S"), class = "factor"), P.Rel. = c(90L,
62L, 92L, 91L, 57L, 81L, 71L, 80L, 40L, 75L), P.Rep1.nH.Rel. = c(8L,
NA, 12L, 10L, 31L, NA, 19L, 17L, 25L, NA), P.Rep1.nH.nRel. = c(82L,
11L, 61L, 72L, 89L, 15L, 79L, 84L, 76L, 25L), P.Rep2.nH.Rel.nRep1. = c(11L,
91L, 18L, 22L, 35L, 64L, 30L, 22L, 25L, 50L), P.Rep2.nH.nRel.nRep1. = c(82L,
12L, 90L, 62L, 62L, 13L, 45L, 53L, 25L, 50L), P.Rep2.nH.Rel.Rep1. = c(11L,
91L, 21L, 21L, 15L, 52L, 9L, 10L, 100L, 50L), P.Rep2.nH.nRel.Rep1. = c(82L,
5L, 81L, 73L, 67L, 22L, 60L, 61L, 100L, 25L), P.H.Reps. = c(100L,
82L, 96L, 99L, 81L, 40L, 71L, 76L, 75L, 90L), P.Rel1.Reps. = c(99L,
80L, 92L, 98L, 81L, 80L, 89L, 79L, 75L, 76L), P.Rel2.Reps. = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_), P.H.Reps..1 = c(NA,
91L, NA, NA, NA, 80L, NA, NA, NA, 100L), CompMean = c(7L,
7L, 7L, 7L, 7L, 7L, 7L, 6L, 4L, 7L)), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))
head(df2)
我已经看到有关此主题的类似文章,但没有回答为什么在我的情况下会生成NA。
我需要添加一些其他参数吗?
答案 0 :(得分:2)
在查看数据时,好像您在一处有一些损坏的数据。您可以通过
进行更正df$Estimate <- replace(df$Estimate, df$Estimate == "P.H.Reps..1", "P.Rep1.nH.Rel.")
,然后使用pivot_wider
,它只会在列NA
中给您P.Rel2.Reps.
tidyr::pivot_wider(df, names_from = Estimate, values_from = value)
答案 1 :(得分:1)
NA值将导致原始长数据帧中不存在的新数据透视列的类别的任何组合。例如,让我们用Estimate=="P.Rep1.nH.Rel."
查看长数据帧的行:
df %>% filter(Estimate=="P.Rep1.nH.Rel.")
Condition First Second Third Fourth ID Scenario Estimate value 1 Control Police Reviewer Journalist Spokesperson 1 J P.Rep1.nH.Rel. 8 2 Control Police Reviewer Journalist Spokesperson 1 R P.Rep1.nH.Rel. 12 3 Control Police Reviewer Journalist Spokesperson 1 S P.Rep1.nH.Rel. 10
现在查看pivot_wider
的结果(为简洁起见,我仅保留了相关的列)。请注意,在下面的输出中,P.Rep1.nH.Rel.
列中缺少一个值。在Scenario=="P"
时会出现缺失值,因为长数据帧没有P.Rep1.nH.Rel.
到Scenario=="P"
的一行,从而导致宽数据帧中的缺失值。出于类似原因,P.H.Reps..1
列中还会出现缺失值,因为在长数据帧中只有一行包含Estimate=="P.H.Reps..1
,并且它有Scenario=="P"
。因此,其他三个方案的值均缺失。
pivot_wider(df, names_from = Estimate, values_from = value) %>%
select(Condition:Scenario, P.Rep1.nH.Rel., P.H.Reps..1)
Condition First Second Third Fourth ID Scenario P.Rep1.nH.Rel. P.H.Reps..1 1 Control Police Reviewer Journalist Spokesperson 1 J 8 NA 2 Control Police Reviewer Journalist Spokesperson 1 P NA 91 3 Control Police Reviewer Journalist Spokesperson 1 R 12 NA 4 Control Police Reviewer Journalist Spokesperson 1 S 10 NA
这可能是数据错误,如@RonakShah所建议,但是如果数据正确,则在转向宽格式时自然会产生NA值。您可以通过向values_fill=list(value=0)
添加参数pivot_wider
来用其他值来填充缺失的值(您当然可以使用所需的任何填充值;我只是使用0
进行说明) 。请注意,即使您使用values_fill
参数,原始长数据中的显式缺失值仍将保留在宽数据框中。旋转操作只会导致丢失的值被填充为其他值。