给出具有以下结构的CSV
id, postCode, someThing, someOtherThing
1,E3 4AX, cats, dogs
2,E3 4AX, elephants, sheep
3,E8 KAK, mice, rats
4,VH3 2K2, humans, whales
我希望根据postCode
列中的值是否唯一来创建两个表。其他列的值对我来说并不重要,但必须将它们复制到新表中。
我的最终数据应如下所示,其中一个表基于唯一的postCode
:
id, postCode, someThing, someOtherThing
3,E8 KAK, mice, rats
4,VH3 2K2, humans, whales
另一个postCode
值重复的地方
id, postCode, someThing, someOtherThing
1,E3 4AX, cats, dogs
2,E3 4AX, elephants, sheep
到目前为止,我可以加载数据,但是不确定下一步:
myData <- read.csv("path/to/my.csv",
header=TRUE,
sep=",",
stringsAsFactors=FALSE
)
R的新手,不胜感激。
dput
格式的数据。
df <-
structure(list(id = 1:4, postCode = structure(c(1L, 1L, 2L, 3L
), .Label = c("E3 4AX", "E8 KAK", "VH3 2K2"), class = "factor"),
someThing = structure(c(1L, 2L, 4L, 3L), .Label = c(" cats",
" elephants", " humans", " mice"), class = "factor"),
someOtherThing = structure(c(1L, 3L, 2L, 4L),
.Label = c(" dogs", " rats", " sheep", " whales "
), class = "factor")), class = "data.frame",
row.names = c(NA, -4L))
答案 0 :(得分:3)
如果df是您的data.frame的名称,它可以形成为:
df <- read.table(header = T, text = "
id, postCode, someThing, someOtherThing
1, E3 4AX, cats, dogs
2, E3 4AX, elephants, sheep
3, E8 KAK, mice, rats
4, VH3 2K2, humans, whales
")
然后可以使用函数n()
找到唯一性和重复项,该函数收集每个grouped variable
的观察次数。然后,
uniques = df %>%
group_by(postCode) %>%
filter(n() == 1)
dupes = df %>%
group_by(postCode) %>%
filter(n() > 1)
不清楚为什么有人编辑了此回复。也许他们讨厌tribbles
答案 1 :(得分:0)
如果您可以处理两个data.frame的列表,这似乎比在.GlobalEnv
中包含许多相关对象要好,请尝试split
。
f <- rev(cumsum(rev(duplicated(df$postCode))))
split(df, f)
#$`0`
# id postCode someThing someOtherThing
#3 3 E8 KAK mice rats
#4 4 VH3 2K2 humans whales
#
#$`1`
# id postCode someThing someOtherThing
#1 1 E3 4AX cats dogs
#2 2 E3 4AX elephants sheep