背景:我有一个数据框,其中一列有重复值。我试图通过挑选具有重复列值的所有行来分割此数据框,处理它们然后吐出一个包含所有已处理行的新数据框。
我对以下代码中出现的问题感到惊讶:
dataSet <- structure(list(DAY = structure(1:10, .Label = c("Tuesday",
"Tuesday", "Tuesday", "Tuesday", "Tuesday",
"Tuesday", "Tuesday", "Tuesday", "Tuesday",
"Tuesday", "Tuesday", "Tuesday", "Tuesday",
"Tuesday", "Tuesday", "Tuesday", "Tuesday",
"Tuesday", "Tuesday", "Tuesday", "Tuesday",
"Tuesday", "Tuesday", "Tuesday"), class = "factor"),
variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("act1", "act2", "act3", "act4",
"act5", "act12", "act19", "act116", "act22",
"act6", "act13", "act111", "act117", "act23",
"act7", "act14", "act112", "act118", "act24",
"act8", "act15", "act113", "act119", "act25",
"act9", "act16", "act114", "act20", "act26",
"act10", "act17", "act115", "act21", "act27",
"act11", "act18"), class = "factor"), value = c(67,
65, 40, 79, 106, 90, 57, 59, 2, 12)), .Names = c("DAY",
"variable", "value"), row.names = c(NA, 10L), class = "data.frame")
uniq <- unique(dataSet$variable)
for (i in 1:length(uniq)){
rowsPerVal <- dataSet[dataSet$variable == uniq[i], ]
print(length(rowsPerVal))
}
我只是不明白最终的print语句如何说长度为3,当数据框中有10条记录与variable
列的值相同时。
答案 0 :(得分:3)
plyr
也适用于这种拆分 - 应用 - 合并问题(将数据拆分成块,对每一个进行操作,然后重新组合)。
library("plyr")
ddply(dataSet, .(variable), nrow)
正如其他人所说,length()
的{{1}}是列数; data.frame
是行数。
nrow()
您可以使用(匿名)函数替换> ddply(dataSet, .(variable), nrow)
variable V1
1 act1 10
,该函数执行您想要的任何处理。
答案 1 :(得分:1)
duplicated
仅对第2个条目返回TRUE。所以你可以用它来索引你的行:
dataSet[duplicated(dataSet$variable),]
您也可以分配给他们:
dataSet[duplicated(dataSet$variable),]$value <- NA
> dataSet
DAY variable value
1 Tuesday act1 67
2 Tuesday act1 NA
3 Tuesday act1 NA
4 Tuesday act1 NA
5 Tuesday act1 NA
6 Tuesday act1 NA
7 Tuesday act1 NA
8 Tuesday act1 NA
9 Tuesday act1 NA
10 Tuesday act1 NA
要“使用所有已处理的行吐出新的数据框”,您可以根据需要处理子集化的data.frame:
newDF <- transform( dataSet[duplicated(dataSet$variable),], DAY=sub("esd","foo",DAY) )