我有一个看起来像的数据集:
partyid coninc
Ind,Near Dem 25926
Not Str Democrat 33333
Not Str Democrat 41667
Strong Democrat 69444
Ind,Near Dem 60185
Ind,Near Dem 50926
Ind,Near Dem 18519
Strong Democrat 3704
Strong Democrat 25926
Strong Democrat 18519
Not Str Republican 18519
Strong Democrat 18519
Not Str Democrat 18519
我想要做的是将数据集格式化为:
partyid 0-50,000 50,000-100,000 100,000-150,000 >150,000
Strong Democrat 2344 3423 4342 54
Not Str Democrat 2643 934 ..
Ind, Near Dem 7656 343 ..
Ind, Near Rep 7655 833 ..
Not Str Republican 2443 343
Strong Republican 3444 773
即按行方变量的级别对行进行排序,按列的变量范围计数对列进行排序。
dput
我的数据:
structure(list(partyid = structure(c(3L, 2L, 2L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 5L, 1L, 2L, 1L, 1L, 4L, 4L, 3L, 4L, 3L), .Label = c("Strong Democrat", "Not Str Democrat", "Ind,Near Dem", "Ind,Near Rep", "Not Str Republican", "Strong Republican"), class = "factor"), coninc = c(25926L, 33333L, 41667L, 69444L, 60185L, 50926L, 18519L, 3704L, 25926L, 18519L, 18519L, 18519L, 18519L, 25926L, 18519L, 33333L, 25926L, 60185L, 69444L, 50926L)), .Names = c("partyid", "coninc"), row.names = c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L), class = "data.frame")
答案 0 :(得分:2)
你可以使用plyr
包很容易地做到这一点(因为你的样本数据有点难读,我删除了partyid
中的逗号和空格):
# creating sample data
dat <- structure(list(partyid = structure(c(3L, 2L, 2L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 5L, 1L, 2L, 1L, 1L, 4L, 4L, 3L, 4L, 3L), .Label = c("Strong Democrat", "Not Str Democrat", "Ind,Near Dem", "Ind,Near Rep", "Not Str Republican", "Strong Republican"), class = "factor"), coninc = c(25926L, 33333L, 41667L, 69444L, 60185L, 50926L, 18519L, 3704L, 25926L, 18519L, 18519L, 18519L, 18519L, 25926L, 18519L, 33333L, 25926L, 60185L, 69444L, 50926L)), .Names = c("partyid", "coninc"), row.names = c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L), class = "data.frame")
# summarising the data with plyr
require(plyr)
dat2 <- ddply(dat, .(partyid), summarise,
zero = sum(coninc < 50001),
fifty = sum(coninc > 50000 & coninc < 100001),
hundred = sum(coninc > 100000 & coninc < 150001),
hfifty = sum(coninc > 150000))
这导致以下输出:
dat2 <- structure(list(partyid = structure(1:5, .Label = c("Strong Democrat", "Not Str Democrat", "Ind,Near Dem", "Ind,Near Rep", "Not Str Republican", "Strong Republican"), class = "factor"), zero = c(6L, 3L, 2L, 2L, 1L), fifty = c(1L, 0L, 4L, 1L, 0L), hundred = c(0L, 0L, 0L, 0L, 0L), hfifty = c(0L, 0L, 0L, 0L, 0L)), .Names = c("partyid", "zero", "fifty", "hundred", "hfifty"), row.names = c(NA, -5L), class = "data.frame")
答案 1 :(得分:2)
你可以在基础R中使用cut
和table
:
dat$cat <- cut(dat$coninc, breaks = c(0, 50000, 100000, 150000, Inf),
labels = c("< 50K", "50K - 100K", "100K - 150K", "> 150K"))
table(dat$partyid, dat$cat)
#
# < 50K 50K - 100K 100K - 150K > 150K
# Strong Democrat 6 1 0 0
# Not Str Democrat 3 0 0 0
# Ind,Near Dem 2 4 0 0
# Ind,Near Rep 2 1 0 0
# Not Str Republican 1 0 0 0
# Strong Republican 0 0 0 0