Question

我有以下名为planets.df的数据框：

     type         | planets | diameter | rotation | rings
---------------------------------------------------------
Terrestrial planet| Mercury |   0.382  |  58.64   | FALSE
Terrestrial planet|   Venus |   0.949  |-243.02   | FALSE
Terrestrial planet|   Earth |   1.000  |   1.00   | FALSE
Terrestrial planet|    Mars |   0.532  |   1.03   | FALSE
Gass giant        | Jupiter |  11.209  |   0.41   | TRUE
Gass giant        |  Saturn |   9.449  |   0.43   | TRUE
Gass giant        |  Uranus |   4.007  |  -0.72   | TRUE
Gass giant        |  Neptune|   3.883  |   0.67   | TRUE

我希望得到所有带环的植物，即rings = TRUE，其代码如下：

ring.vector <- planets.df$rings
planets.with.rings.df <- planets.df[rings.vector,]

有人可以告诉我为什么会这样吗？我自己没有提出代码，但想了解它的工作原理。部分[rings.vector,]表示rings=TRUE？

谢谢！

Answer 1

rings.vector是一个包含TRUE或FALSE指标的向量，对应于rings的列。如果您希望将这些响铃设置为TRUE，请使用[rings.vector, ]选择 rings==TRUE和所有列的行。

Answer 2

它有效，因为在df[<condition,]类型的语句中，condition部分基本上是T / F的向量。保留对应于TRUE的行号，省略对应于FALSE的行号。

rings.vector已经是T / F的向量。您可以使用rings.vector == TRUE条件来提供相同的条件。

在您的情况下，它可能无关紧要，但如果您的NA向量中的condition或您要过滤的列中有{{1}}，请务必小心。

Answer 3

如果您有数据框，则可以通过2种不同的方式引用特定的行和列。

您可以使用df[row_numbers,column_numbers]或
您可以使用布尔变量（TRUE / FALSE）来指示所需的行/列。使用rings.vector，它将查找与rings.vector中所有TRUE值的索引匹配的行号，并在使用df[rings.vector,]时拉出相应的行。

在上面的示例中，列中没有检查任何内容，但是您需要括号中的逗号来指示逗号之前的对象引用行。大多数情况下，您只会使用行的布尔值和列的特定数字。

Answer 4

这是一个可重复的小例子。我使用data.table添加了一些示例。如果代码不对，请更正代码。

data <- data.frame(id = 1:100, x = rnorm(100, 100, 50))
data$flag <- ifelse(data$x > 100, TRUE, FALSE)
head(data)

# FALSE can be subseted using 0 
data[data == FALSE]
data[data == 0]
str(data$flag)

# As it's of class:
class(data$flag)

# Using Data Table
library("data.table")
DT <- data.table(data)

setkey(DT, flag)
DT[J(FALSE)]
DT[J(TRUE)]

# Aggregate (Group by)
DT[, quantile(x), by = flag]

DT[, list(mean(x), 
          sum = sum(x),
          meadian = median(x))
   , by = flag]

Answer 5

另一个角度是使用subset（），这是相当直观的：它只提取条件（第二个参数）为真的数据帧中的那些行。

planets.with.rings.df <- subset(planets.df, rings == TRUE)

或只是简单地

planets.with.rings.df <- subset(planets.df, rings)

第一个解决方案中的“== TRUE”是多余的，因为您正在比较布尔矢量！

带条件的数据框

5 个答案: