R过滤数据帧会创建空行

时间:2017-12-06 18:19:34

标签: r csv dataframe subset

我有一个导入R的csv文件,我希望过滤掉在其中一列中不包含某个字母的行。我已经尝试了子集和dplyr,它们都产生了列名但是变成了空行。我知道该栏目中包含我正在寻找的字母,所以我不明白为什么这些行是空的。 这是我在数据集上调用head函数时得到的结果:

head(dbbt)
   X.Focal_DB. X.Effect_size. X.Variance.            X.Study. X.BT.
1         165        -0.1931   0.0132000      'Agrawal_1998'   'B'
2          21        -1.4414   0.1938000      'Agrawal_1999'   'B'
3          19        -3.1642   0.2402559      'Agrawal_1999'   'B'
4          19        -1.0272   0.0731000 'Agrawal_1999-2000'   'B'

(使用X..围绕他们导入的同名,我无法找出原因 - 他们不包含任何禁用的字符)

当我尝试:

 dbbtjustb <- subset(dbbt, X.BT. == "B")

我明白了:

head(dbbtjustb)
[1] X.Focal_DB.    X.Effect_size. X.Variance.    X.Study.      
[5] X.BT.         
<0 rows> (or 0-length row.names)

当我尝试时:

dbbt %>%
    select(X.F_DietBreadth., X.Effect_size., X.variance., X.Bottom_up_top_down.) %>%
    filter(X.Bottom_up_top_down. == "B")

我得到了同样的东西。请帮忙!

编辑:结构(这不是我原来的数据集,因为那是巨大的)

structure(list(X.Focal_DB. = c(31L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 8L, 1L, 1L, 1L, 1L, 2L, 6L, 126L, 22L, 126L, 27L), X.Effect_size. = c(-0.0951, 
0.4797, -0.1705, 0.713, -0.2661, -0.6614, -1.5941, -2.1892, -0.2133, 
-0.2183, -0.0275, -0.268, -5.0499, -3.2934, -0.9469, 0.6316, 
2.236, 0.1724, 1.6541, -1.5496), X.Variance. = c(0.1006223807, 
0.0468390134, 0.0124, 0.014674063, 0.1385, 0.15, 0.3866, 0.4706, 
0.1025, 0.3688, 0.1354, 0.1444, 0.1641758772, 0.0849100448, 0.0783, 
0.040866755, 0.1814043974, 0.0535, 0.1503, 0.0999), X.Study. = structure(c(1L, 
2L, 3L, 4L, 6L, 6L, 5L, 5L, 7L, 8L, 9L, 9L, 10L, 10L, 11L, 12L, 
13L, 14L, 15L, 16L), .Label = c("'Bergeson & Messina_1997'", 
"'Bergeson & Messina_1997- 1998'", "'Cronin & Abrahamson_1999'", 
"'Dechert & Ulber_2004'", "'Denno_et_al. 2000'", "'Denno & Roderick_1992'", 
"'Dorn_et_al. 2003'", "'Evans & England_1996'", "'Ferrenberg & Denno_2003'", 
"'Finch & Jones_1989'", "'Floate & Whitham_1994'", "'Formusoh_et_al. 1992'", 
"'Forrest_1971'", "'Fritz_1983'", "'Gange & Brown_1989'", "'Gianoli_2000'"
), class = "factor"), X.BT. = structure(c(2L, 3L, 2L, 2L, 2L, 
2L, 4L, 2L, 3L, 4L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L), .Label = c("'ants'", 
"'B'", "'NA'", "'T'"), class = "factor")), .Names = c("X.Focal_DB.", 
"X.Effect_size.", "X.Variance.", "X.Study.", "X.BT."), class = "data.frame", row.names = c(NA, 
-20L))

1 个答案:

答案 0 :(得分:2)

实际条目为'B',这意味着您需要按"'B'"进行分组。

> unique(df$X.BT.)
[1] 'B'    'NA'   'T'    'ants'

使用dplyr

> filter(df, X.BT. == "'B'")
   X.Focal_DB. X.Effect_size. X.Variance.                   X.Study. X.BT.
1           31        -0.0951  0.10062238  'Bergeson & Messina_1997'   'B'
2            1        -0.1705  0.01240000 'Cronin & Abrahamson_1999'   'B'
3            1         0.7130  0.01467406     'Dechert & Ulber_2004'   'B'
4            1        -0.2661  0.13850000    'Denno & Roderick_1992'   'B'
5            1        -0.6614  0.15000000    'Denno & Roderick_1992'   'B'
6            1        -2.1892  0.47060000        'Denno_et_al. 2000'   'B'
7            1        -0.0275  0.13540000  'Ferrenberg & Denno_2003'   'B'
8            1        -0.2680  0.14440000  'Ferrenberg & Denno_2003'   'B'
9            1        -5.0499  0.16417588       'Finch & Jones_1989'   'B'
10           1        -3.2934  0.08491004       'Finch & Jones_1989'   'B'
11           6         0.6316  0.04086675     'Formusoh_et_al. 1992'   'B'
12         126         2.2360  0.18140440             'Forrest_1971'   'B'
13         126         1.6541  0.15030000       'Gange & Brown_1989'   'B'
14          27        -1.5496  0.09990000             'Gianoli_2000'   'B'