我有一个很大的数据框,其中包含很多行和列。在一列中有字符,其中一些字符仅出现一次,其他多次。现在,我想将整个数据帧分开,以便最终得到两个数据帧,一个数据帧的所有行在这一列中具有重复的字符,另一数据帧的所有行中的字符仅出现一次。例如:
One = c(1,2,3,4,5,6,7,8,9,10)
Two = c(4,5,3,6,2,7,1,8,1,9)
Three = c("a", "b", "c", "d","d","e","f","e","g","c")
df <- data.frame(One, Two, Three)
> df
One Two Three
1 1 4 a
2 2 5 b
3 3 3 c
4 4 6 d
5 5 2 d
6 6 7 e
7 7 1 f
8 8 8 e
9 9 1 g
10 10 9 c
我希望有两个数据帧,例如
> dfSingle
One Two Three
1 1 4 a
2 2 5 b
7 7 1 f
9 9 1 g
> dfMultiple
One Two Three
3 3 3 c
4 4 6 d
5 5 2 d
6 6 7 e
8 8 8 e
10 10 9 c
我尝试使用duplicated()
函数
dfSingle = subset(df, !duplicated(df$Three))
dfMultiple = subset(df, duplicated(df$Three))
,但不能作为“ c”,“ d”和“ e”中的第一个转到“ dfSingle”。 我也试图做一个循环
MulipleValues = unique(df$Three[c(which(duplicated(df$Three)))])
dfSingle = data.frame()
x = 1
dfMultiple = data.frame()
y = 1
for (i in 1:length(df$One)) {
if(df$Three[i] %in% MulipleValues){
dfMultiple[x,] = df[i,]
x = x+1
} else {
dfSingle[y,] = df[i,]
y = y+1
}
}
这似乎做对了,因为数据框现在具有正确的行数,但它们以某种方式具有0列。
> dfSingle
data frame with 0 columns and 4 rows
> dfMultiple
data frame with 0 columns and 6 rows
我在做什么错?还是有其他方法可以做到这一点?
感谢您的帮助!
答案 0 :(得分:4)
在基数R中,我们可以将using namespace System;
与split
一起使用,这将返回两个数据帧的列表。
duplicated
其中df1 <- split(df, duplicated(df$Three) | duplicated(df$Three, fromLast = TRUE))
df1
#$`FALSE`
# One Two Three
#1 1 4 a
#2 2 5 b
#7 7 1 f
#9 9 1 g
#$`TRUE`
# One Two Three
#3 3 3 c
#4 4 6 d
#5 5 2 d
#6 6 7 e
#8 8 8 e
#10 10 9 c
可被视为df1[[1]]
,而dfSingle
被视为df1[[2]]
。
答案 1 :(得分:1)
这是一个dplyr
的娱乐场所,
library(dplyr)
df %>%
group_by(Three) %>%
mutate(new = n() > 1) %>%
split(.$new)
给出,
$`FALSE` # A tibble: 4 x 4 # Groups: Three [4] One Two Three new <dbl> <dbl> <fct> <lgl> 1 1 4 a FALSE 2 2 5 b FALSE 3 7 1 f FALSE 4 9 1 g FALSE $`TRUE` # A tibble: 6 x 4 # Groups: Three [3] One Two Three new <dbl> <dbl> <fct> <lgl> 1 3 3 c TRUE 2 4 6 d TRUE 3 5 2 d TRUE 4 6 7 e TRUE 5 8 8 e TRUE 6 10 9 c TRUE
答案 2 :(得分:0)
您可以使用底数R
One = c(1,2,3,4,5,6,7,8,9,10)
Two = c(4,5,3,6,2,7,1,8,1,9)
Three = c("a", "b", "c", "d","d","e","f","e","g","c")
df <- data.frame(One, Two, Three)
str(df)
df$Three <- as.character(df$Three)
df$count <- as.numeric(ave(df$Three,df$Three,FUN = length))
dfSingle = subset(df,df$count == 1)
dfMultiple = subset(df,df$count > 1)
答案 3 :(得分:0)
使用dplyr
的方式:
library(dplyr)
df %>%
group_split(Duplicated = (add_count(., Three) %>% pull(n)) > 1)
输出:
[[1]]
# A tibble: 4 x 4
One Two Three Duplicated
<dbl> <dbl> <fct> <lgl>
1 1 4 a FALSE
2 2 5 b FALSE
3 7 1 f FALSE
4 9 1 g FALSE
[[2]]
# A tibble: 6 x 4
One Two Three Duplicated
<dbl> <dbl> <fct> <lgl>
1 3 3 c TRUE
2 4 6 d TRUE
3 5 2 d TRUE
4 6 7 e TRUE
5 8 8 e TRUE
6 10 9 c TRUE