Question

我想创建一个名为“data”的数据框的两个子集。原始数据框包含三个小组/小组：阿富汗，巴西和德国，1999 - 2001年为三年。

子集ONE应仅包含人口＆gt;的国家/地区。 1999年的500,000。这意味着不仅删除了1999年人口<= 500,000的特定行，而是整个小组/小组/国家。

子集TWO应仅包含三年内人口平均值> 500,000的小组/组/国家/地区。我认为这意味着首先要创建一个“data $ meanpop”的新变量，然后创建子集。

我尝试过使用子集和dplyr包/函数，但我无法使用它。

一个最小的例子：

a <- c(rep("Afghanistan",3),
   rep("Brazil",3),
   rep("Germany",3))
b <- c(1999:2001,1999:2001,1999:2001)
c <- c(520000,510000,530000,20,0,5,NA,7000,1800000)
data <- as.data.frame(cbind(a,b,c))
colnames(data) <- c("country","year","population")

data
country year population
1 Afghanistan 1999     520000
2 Afghanistan 2000     510000
3 Afghanistan 2001     530000
4      Brazil 1999         20
5      Brazil 2000          0
6      Brazil 2001          5
7     Germany 1999       <NA>
8     Germany 2000       7000
9     Germany 2001    1800000

生成的子集ONE应如下所示：

1 Afghanistan 1999     520000
2 Afghanistan 2000     510000
3 Afghanistan 2001     530000

生成的子集TWO应该如下所示（我这里没有创建数据$ average列）：

country year population   meanpop
1 Afghanistan 1999     520000 520000.00
2 Afghanistan 2000     510000 520000.00
3 Afghanistan 2001     530000 520000.00
7     Germany 1999       <NA> 903500.00
8     Germany 2000       7000 903500.00
9     Germany 2001    1800000 903500.00

Answer 1

我会回答那些遇到类似问题的人的问题。

子集ONE：

newdata <- data[ which(data$year==1999 & data$population>500000),]
keep <- newdata$country
data[data$country==keep,]

给你：

# country year population
# 1 Afghanistan 1999     520000
# 2 Afghanistan 2000     510000
# 3 Afghanistan 2001     530000

子集TWO：

a <- by(data$population,data$country,mean,na.rm=T)
a
a > 500000 #to check countries by 'eye'

means <- as.data.frame(a[data$country])
colnames(means) <- "popmean"
means <- round(means,2)
data2 <- cbind(data,means) #now we have a new variable with the panel means
newdata2 <- data2[ which(data2$popmean>500000),]

给你：

#               country year population popmean
# 1 Afghanistan 1999     520000  520000
# 2 Afghanistan 2000     510000  520000
# 3 Afghanistan 2001     530000  520000
# 7     Germany 1999         NA  903500
# 8     Germany 2000       7000  903500
# 9     Germany 2001    1800000  903500

如果有人知道解决这个问题的简单方法，我仍然会感谢您的评论，以便我提高编码技能。

基于列条目的数据框中具有完整面板的子集

1 个答案: