我正在使用合并的df,并使用summarise(n = count(var))
了解重复项。然后,我获得了该对象调用x
:
n.x n.freq
1 001121011522 1
2 001121711014 1
3 001121711015 1
4 001121711722 1
5 001121711723 1
6 001131811722 1
我知道n.freq
从1变到6,所以我只想抓取那些等于或大于2的那些。
y <- x %>% filter(n$freq >= 2)
但是我明白了:
Error: Column `n` must be a 1d atomic vector or a list
Call `rlang::last_error()` to see a backtrace
所以我检查了它的结构并得到了它:
> str(x)
'data.frame': 485843 obs. of 1 variable:
$ n:'data.frame': 485843 obs. of 2 variables:
..$ x : Factor w/ 485843 levels "001121011522",..: 1 2 3 4 5 6 7 8 9 10 ...
..$ freq: int 1 1 1 1 1 1 1 1 1 1 ...
据我所知,它看起来在另一个内部有一个df。我想知道如何提取第二个(内部一个)df来使用它。
[UPDATE]:使用dput(head(x, 20))
我得到了:
"ABC253478", "ABC983659", "ABC654911", "ABC882310",
..... # there were at least 100 rows of results
"ABC665892", "ABC441276", "ABC906138", "ABC679967"
), class = "factor"), freq = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), row.names = c(NA,
20L), class = "data.frame")), row.names = c(NA, 20L), class = "data.frame")
答案 0 :(得分:0)
有 got 可以使这些数据变得有些混乱...但是让我们看看我是否可以尝试再现足够。 (前期:并不完美,但是我们没有足够的信息来改进这一点。)
一些示例数据,手动编写以模拟我认为您缺少的行:
x <- structure(
list(n = structure(
list(x = structure(c(3L, 7L, 6L, 2L, 10L, 9L, 1L, 8L, 4L, 5L),
.Label = c("112916", "116806", "118489", "146802", "154999",
"157333", "170238", "183345", "194348", "194384"),
class = "factor"),
freq = c(2L, 1L, 3L, 1L, 2L, 3L, 3L, 1L, 2L, 1L)),
class = "data.frame", row.names = c(NA, -10L) ) ),
class = "data.frame", row.names = c(NA, -10L)
)
str(x)
# 'data.frame': 10 obs. of 1 variable:
# $ n:'data.frame': 10 obs. of 2 variables:
# ..$ x : Factor w/ 10 levels "112916","116806",..: 3 7 6 2 10 9 1 8 4 5
# ..$ freq: int 2 1 3 1 2 3 3 1 2 1
这不会复制您的错误:
str( x %>% filter(n$freq >= 2) )
# 'data.frame': 6 obs. of 1 variable:
# $ n:'data.frame': 6 obs. of 2 variables:
# ..$ x : Factor w/ 10 levels "112916","116806",..: 3 6 10 9 1 4
# ..$ freq: int 2 3 2 3 3 2
也许这对您有用吗?
str( subset(x, n$freq >= 2) )
# 'data.frame': 6 obs. of 1 variable:
# $ n:'data.frame': 6 obs. of 2 variables:
# ..$ x : Factor w/ 10 levels "112916","116806",..: 3 6 10 9 1 4
# ..$ freq: int 2 3 2 3 3 2
不幸的是,这可能对您有用,但具有平整数据库(无意)的作用:
str( x[ x$n$freq >= 2, ] )
# 'data.frame': 6 obs. of 2 variables:
# $ x : Factor w/ 10 levels "112916","116806",..: 3 6 10 9 1 4
# $ freq: int 2 3 2 3 3 2
看来,此嵌套框架很容易解散:
str( x[[1]] )
# 'data.frame': 10 obs. of 2 variables:
# $ x : Factor w/ 10 levels "112916","116806",..: 3 7 6 2 10 9 1 8 4 5
# $ freq: int 2 1 3 1 2 3 3 1 2 1
因此,我们尝试将嵌套框架替换为“正常”框架:
xflat <- x[[1]]
str( filter(xflat, freq >= 2) )
# 'data.frame': 6 obs. of 2 variables:
# $ x : Factor w/ 10 levels "112916","116806",..: 3 6 10 9 1 4
# $ freq: int 2 3 2 3 3 2
str( subset(xflat, freq >= 2) )
# 'data.frame': 6 obs. of 2 variables:
# $ x : Factor w/ 10 levels "112916","116806",..: 3 6 10 9 1 4
# $ freq: int 2 3 2 3 3 2
str( xflat[ xflat$freq >= 2, ] )
# 'data.frame': 6 obs. of 2 variables:
# $ x : Factor w/ 10 levels "112916","116806",..: 3 6 10 9 1 4
# $ freq: int 2 3 2 3 3 2