我很难弄清楚如何将一些宽数据转换为长格式。我有三列字符串数据(A1_R00_FillerNP
,A1_R01_ADV
和A1_R02_1stEmbV
),我希望将这些列融合到一列(WordCountRegion
)中主题和项目正确的单词将从这三列中的一列映射到新的WordCountRegion
列。
使用下面代码中的简单melt
方法让我成为其中的一部分:
(注意:df
中的奇怪字符无关紧要 - 请在此处忽略它们)
df <- structure(list(Subject = c(101L, 101L, 101L, 101L, 101L, 101L,
101L, 101L, 101L, 101L, 101L, 101L, 101L, 101L, 101L, 101L, 101L,
101L), condition = structure(c(2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L,
3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L), .Label = c("P", "R",
"S"), class = "factor"), item = c(101L, 102L, 103L, 101L, 102L,
103L, 101L, 102L, 103L, 101L, 102L, 103L, 101L, 102L, 103L, 101L,
102L, 103L), A1_R00_FillerNP = structure(c(3L, 2L, 1L, 3L, 2L,
1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L), .Label = c("SÌÇna d_r allvarliga konsekvenser",
"SÌÇna d_r fina _ppeltr_d", "SÌÇna d_r gamla skottk_rror"
), class = "factor"), A1_R01_ADV = structure(c(1L, 1L, 2L, 1L,
1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L), .Label = c("alltid",
"f_rresten"), class = "factor"), A1_R02_1stEmbV = structure(c(3L,
2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L,
1L), .Label = c("diskuterade", "stod", "tv_ttade"), class = "factor"),
RT = c(0L, 149L, 247L, 272L, 171L, 245L, 317L, 0L, 233L,
0L, 981L, 750L, 272L, 171L, 334L, 317L, 0L, 233L), Region = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L, 3L), .Label = c("R00", "R01", "R02"), class = "factor"),
RegionType = structure(c(3L, 3L, 3L, 2L, 2L, 2L, 1L, 1L,
1L, 3L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("1stEmbV",
"ADV", "FillerNP"), class = "factor"), DV = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("FIRST_FIXATION_DURATION", "GAZE_DURATION"
), class = "factor")), .Names = c("Subject", "condition",
"item", "A1_R00_FillerNP", "A1_R01_ADV", "A1_R02_1stEmbV", "RT",
"Region", "RegionType", "DV"), class = "data.frame", row.names = c(NA,
-18L))
df1 = melt(df, measure.vars = c("A1_R00_FillerNP","A1_R01_ADV","A1_R02_1stEmbV"), var = "WordCountRegion")
问题是这段代码错误地破坏了跨区域的单词。我最终输出如下所示的输出,其中单词不会按Region
指定的方式中断,而是延伸到Region
的值,WordCountRegion
和value
可以看到。很明显,如果我要使用它,那么我需要某种额外的规范,以便melt()能够正确地破坏数据。我只是不确定如何做到这一点(或者如果它可以在melt()中完成)。
Subject condition item RT Region RegionType DV WordCountRegion value
1 101 R 101 0 R00 FillerNP FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
2 101 P 102 149 R00 FillerNP FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
3 101 S 103 247 R00 FillerNP FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
4 101 R 101 272 R01 ADV FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
5 101 P 102 171 R01 ADV FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
6 101 S 103 245 R01 ADV FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
7 101 R 101 317 R02 1stEmbV FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
8 101 P 102 0 R02 1stEmbV FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
9 101 S 103 233 R02 1stEmbV FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
10 101 R 101 0 R00 FillerNP GAZE_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
11 101 P 102 981 R00 FillerNP GAZE_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
12 101 S 103 750 R00 FillerNP GAZE_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
13 101 R 101 272 R01 ADV GAZE_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
14 101 P 102 171 R01 ADV GAZE_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
15 101 S 103 334 R01 ADV GAZE_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
16 101 R 101 317 R02 1stEmbV GAZE_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
17 101 P 102 0 R02 1stEmbV GAZE_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
18 101 S 103 233 R02 1stEmbV GAZE_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
19 101 R 101 0 R00 FillerNP FIRST_FIXATION_DURATION A1_R01_ADV alltid
20 101 P 102 149 R00 FillerNP FIRST_FIXATION_DURATION A1_R01_ADV alltid
21 101 S 103 247 R00 FillerNP FIRST_FIXATION_DURATION A1_R01_ADV f_rresten
我是否有办法修改melt()
以使Region
排队/匹配,如下例所示:
Subject condition item RT Region RegionType DV WordCountRegion value
1 101 R 101 0 R00 FillerNP FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
2 101 P 102 149 R00 FillerNP FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
3 101 S 103 247 R00 FillerNP FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
4 101 R 101 272 R01 ADV FIRST_FIXATION_DURATION A1_R01_ADV alltid
5 101 P 102 171 R01 ADV FIRST_FIXATION_DURATION A1_R01_ADV alltid
6 101 S 103 245 R01 ADV FIRST_FIXATION_DURATION A1_R01_ADV f_rresten
7 101 R 101 317 R02 1stEmbV FIRST_FIXATION_DURATION A1_R02_1stEmbV tv_ttade
8 101 P 102 0 R02 1stEmbV FIRST_FIXATION_DURATION A1_R02_1stEmbV stod
9 101 S 103 233 R02 1stEmbV FIRST_FIXATION_DURATION A1_R02_1stEmbV diskuterade
10 101 R 101 0 R00 FillerNP GAZE_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
11 101 P 102 981 R00 FillerNP GAZE_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
12 101 S 103 750 R00 FillerNP GAZE_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
或者,如果我完全使用了错误的功能,有人可以指点我寻求更好的解决方案吗?也许我需要一些可以进行实际查找的东西?
答案 0 :(得分:1)
您可以创建一个小查找表,将其合并,然后使用它来过滤您的融化数据帧,我相信这会为您提供您正在寻找的结果。
region_df <- data.frame(var = c("A1_R00_FillerNP","A1_R01_ADV","A1_R02_1stEmbV"),
Region = c('R00','R01','R02'))
df2 <- merge(df1, region_df)
df3 <- subset(df2, var==WordCountRegion)