Question

我列出了每个州在1995 - 2005年期间获得未达标状态的县。

我想知道每个州每年有多少县获得这种身份。

如果我的数据格式如下，

State1 Country1 YR1 Yr2 Yr3 Yr4...
State1 Country2 YR1 Yr2 Yr3 Yr4
State2 County1  Yr1 Yr2.....

每年变量可能有1或0，因为一个县可能会在一段时间内获得或失去这种状态。

我需要计算每个州有多少县没有达到未达标状态（YRx = 1），但是想不出怎么做。

Answer 1

我使用了以下示例：

data <- read.table(textConnection("
state county Yr1 Yr2 Yr3 Yr4
state1 county1 1 0 0 1
state1 county2 0 0 0 0
state1 county3 0 1 0 0
state1 county4 0 0 0 0
state1 county5 0 1 0 1
state2 county6 0 0 0 0
state2 county7 0 0 1 0
state2 county8 1 0 0 1
state2 county9 0 0 0 0
state2 county10 0 1 0 0
state3 county11 1 1 1 1
state3 county12 0 0 0 0
state3 county13 0 1 1 0
state3 county14 0 0 0 1
state4 county15 0 0 0 0
state4 county16 1 0 1 0
state4 county17 0 0 0 0
state4 county18 1 1 1 1
"), header = T)

library(reshape)
data2 <- melt(data, id = c("state", "county"))
cast(data2, state ~ variable, fun = sum)

结果：

   state Yr1 Yr2 Yr3 Yr4
1 state1   1   2   0   2
2 state2   1   1   1   1
3 state3   1   2   2   2
4 state4   2   1   2   1

Answer 2

此数据是否组织为数据框？如果是这样，行如何定义？如果您的数据以这种方式组织：

State   County  Year    Attainment  
State1   County1  1       1  
State1   County1  2       0
State1   County1  3       1
State1   County1  4       1
State1   County2  1       1
State1   County2  2       1
...

然后可以通过1行代码获得您正在寻找的那种摘要数据。希望您的符号表示您的数据组织如下：

State   County  Yr1 Yr2 Yr3 Yr4
State1   County1 1  0   1   1
State1   County2 1  1   1   1

使用melt()包中的reshape从此格式转到上面列出的格式。

new.df <- melt(df, id = 1:2)

它将调用Year变量variable和Attainment变量value。现在，通过巧妙地使用cast函数，也可以从reshape包中，您可以获得所需的摘要。

counties <- cast(new.df, State ~ value, fun = length)
head(counties)

但是，如果您的数据被组织起来以便每个州，县和年都是一列，并且它只有1行，我认为您最好的下一步是重新格式化R之外的数据，以便它至少看起来如此就像我的第二个例子。

IN R计数分层数据

2 个答案: