Question

我试图获取data.frame的子集，但我不理解结果背后的逻辑。我曾经使用SQL，我认为我可以隔离矩阵的一个值，以便更容易工作。

// created a dataset example, with any values and then I combined
Country <- c('Argentina','Brazil','Chile')
Quantity <- c(1, seq(5)) 
M <- cbind(Country,Quantity)
M <- as.data.frame(M)

// the result 
  Country       Quantity
1 Argentina     1
2 Brazil        1
3 Chile         2
4 Argentina     3
5 Brazil        4
6 Chile         5

// now I tried to isolated
test <- M[M$Country=="Brazil",]   

// and still good
   Country    Quantity    
2  Brazil     1    
5  Brazil     4

当我使用命令＆＃34; table＆＃34;时，对我而言是关闭计数（*），计数结果是正常的，但它带来了所有国家，我不明白这一点结果，因为我后来只过滤了巴西。

    table(test)

               Quantity
    Country    1 2 3 4 5
    Argentina  0 0 0 0 0
    Brazil     1 0 0 1 0
    Chile      0 0 0 0 0

谢谢，

菲利普

Answer 1

你有因素。即使在子集之后，级别也保持不变。您可以使用stringsAsFactors = FALSE

提前避免整个事情

M <- data.frame(Country, Quantity, stringsAsFactors=FALSE)
test <- M[M$Country=="Brazil",]   
table(test)
#         Quantity
# Country  1 4
#   Brazil 1 1

如果您需要因素，您还可以使用droplevels删除不必要的级别：

test <- M[M$Country=="Brazil",]   
test2 <- droplevels(test)
table(test2)
#         Quantity
# Country  1 4
#   Brazil 1 1

R中子集背后的逻辑

1 个答案: