我正在寻找更有效的代码版本,它将取代数据框中的因素。
这是我的数据集:
structure(list(Rio.Olympics.Sports.Participating.Team = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("American Gymnastics",
"American Swimmers", "Boxing", "European Gymnastics", "Running",
"Free-style swimming", "Breaststroke Swimming", "Diving", "Athletics",
"Soccer"), class = "factor"), Calendar.Quarter = structure(c(16071,
16161, 16252, 16344, 16436, 16526, 16617, 16709, 16801, 16892,
16983, 17075, 16071, 16161, 16252, 16344, 16436, 16526, 16617,
16709, 16801, 16892, 16983, 17075, 16071, 16161, 16252, 16344,
16436, 16526, 16617, 16709, 16801, 16892, 16983, 17075, 16071,
16161, 16252, 16344, 16436, 16526, 16617, 16709, 16801, 16892,
16983, 17075, 16071, 16161, 16252, 16344, 16436, 16526, 16617,
16709, 16801, 16892, 16983, 17075), class = "Date"), Randomized.Viewers = c(49,
45, 51, 55, 47, 48, 54, 57, 53, 50, 52, 58, 32, 29, 33, 40, 34,
36, 31, 39, 37, 30, 35, 41, 5, 1, 25, 46, 38, 4, 56, 27, 21,
43, 42, 44, 2, 59, 3, 10, 60, 7, 14, 24, 13, 16, 17, 28, 15,
6, 19, 23, 11, 12, 20, 22, 9, 8, 18, 26)), .Names = c("Rio.Olympics.Sports.Participating.Team",
"Calendar.Quarter", "Randomized.Viewers"), row.names = c(NA,
-60L), class = "data.frame")
现在,我想更改因子标签。这是我做的:
Old_labels <- c("American Swimmers", "American Gymnastics",
"European Gymnastics", "Running", "Boxing")
New_labels <- c("Jupitean Swimmers", "Saturnish Gymastics",
"Plutoish Gymnastics", "Walking", "Fighting")
Apply_lables <- data.frame(Old_labels, New_labels)
colnames(Apply_lables)[1] <- "Old_labels"
最后,这段代码可以解决问题:
p1 <- p
p1$Rio.Olympics.Sports.Participating.Team <-
Apply_lables[match(p$Rio.Olympics.Sports.Participating.Team,
Apply_lables$Old_labels), "New_labels"]
以下是修改后的数据框:
Rio.Olympics.Sports.Participating.Team Calendar.Quarter Randomized.Viewers
1 Jupitean Swimmers 2014-01-01 49
2 Jupitean Swimmers 2014-04-01 45
3 Jupitean Swimmers 2014-07-01 51
4 Jupitean Swimmers 2014-10-01 55
5 Jupitean Swimmers 2015-01-01 47
6 Jupitean Swimmers 2015-04-01 48
问题:作为R的初学者,我在几个小时内挣扎了很多。虽然我设法得到了我想要的东西,但有没有更好的方法(即更少的代码行和更快的实现)来改变基于查找表的因素?我的原始数据集大约有1M行,上面的代码需要花费很多时间才能运行。
我在SO上研究了这个主题,但我不认为这在任何地方都有过。虽然有一些帖子谈到使用match()来使用查找表来更改行。
答案 0 :(得分:0)
如果您想要从特定列表中替换Old_labels
,我认为您可以将levels()
置为所需因子的子集,并在那里推送新标签:
levels(dta$Rio.Olympics.Sports.Participating.Team)[
dta$Rio.Olympics.Sports.Participating.Team %in% Old_labels] <-
New_labels