在单个列中基于唯一值和非唯一值创建表

时间:2018-12-21 17:25:15

标签: r

给出具有以下结构的CSV

id, postCode, someThing, someOtherThing
1,E3 4AX, cats, dogs
2,E3 4AX, elephants, sheep
3,E8 KAK, mice, rats
4,VH3 2K2, humans, whales

我希望根据postCode列中的值是否唯一来创建两个表。其他列的值对我来说并不重要,但必须将它们复制到新表中。

我的最终数据应如下所示,其中一个表基于唯一的postCode

id, postCode, someThing, someOtherThing
3,E8 KAK, mice, rats
4,VH3 2K2, humans, whales

另一个postCode值重复的地方

id, postCode, someThing, someOtherThing    
1,E3 4AX, cats, dogs
2,E3 4AX, elephants, sheep

到目前为止,我可以加载数据,但是不确定下一步:

myData <- read.csv("path/to/my.csv",
  header=TRUE,
  sep=",",
  stringsAsFactors=FALSE
)

R的新手,不胜感激。

dput格式的数据。

df <-
structure(list(id = 1:4, postCode = structure(c(1L, 1L, 2L, 3L
), .Label = c("E3 4AX", "E8 KAK", "VH3 2K2"), class = "factor"), 
someThing = structure(c(1L, 2L, 4L, 3L), .Label = c(" cats", 
" elephants", " humans", " mice"), class = "factor"), 
someOtherThing = structure(c(1L, 3L, 2L, 4L), 
.Label = c(" dogs", " rats", " sheep", " whales               "
), class = "factor")), class = "data.frame", 
row.names = c(NA, -4L))

2 个答案:

答案 0 :(得分:3)

如果df是您的data.frame的名称,它可以形成为:

df <- read.table(header = T, text = "
id, postCode, someThing, someOtherThing
1, E3 4AX, cats, dogs
2, E3 4AX, elephants, sheep
3, E8 KAK, mice, rats
4, VH3 2K2, humans, whales
       ")

然后可以使用函数n()找到唯一性和重复项,该函数收集每个grouped variable的观察次数。然后,

uniques = df %>%
  group_by(postCode) %>%
  filter(n() == 1)

dupes = df %>%
  group_by(postCode) %>%
  filter(n() > 1)

不清楚为什么有人编辑了此回复。也许他们讨厌tribbles

答案 1 :(得分:0)

如果您可以处理两个data.frame的列表,这似乎比在.GlobalEnv中包含许多相关对象要好,请尝试split

f <- rev(cumsum(rev(duplicated(df$postCode))))
split(df, f)
#$`0`
#  id postCode someThing         someOtherThing
#3  3   E8 KAK      mice                   rats
#4  4  VH3 2K2    humans  whales               
#
#$`1`
#  id postCode  someThing someOtherThing
#1  1   E3 4AX       cats           dogs
#2  2   E3 4AX  elephants          sheep