将R中的数据集重新格式化为行级别,列格式为范围

时间:2014-04-05 08:07:51

标签: r dataset format levels

我有一个看起来像的数据集:

partyid            coninc
Ind,Near Dem       25926
Not Str Democrat   33333
Not Str Democrat   41667
Strong Democrat    69444
Ind,Near Dem       60185
Ind,Near Dem       50926
Ind,Near Dem       18519
Strong Democrat    3704
Strong Democrat    25926
Strong Democrat    18519
Not Str Republican 18519
Strong Democrat    18519
Not Str Democrat   18519

我想要做的是将数据集格式化为:

partyid             0-50,000   50,000-100,000   100,000-150,000   >150,000
Strong Democrat     2344       3423             4342              54
Not Str Democrat    2643       934              ..
Ind, Near Dem       7656       343              ..
Ind, Near Rep       7655       833              .. 
Not Str Republican  2443       343
Strong Republican   3444       773

即按行方变量的级别对行进行排序,按列的变量范围计数对列进行排序。

dput我的数据:

structure(list(partyid = structure(c(3L, 2L, 2L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 5L, 1L, 2L, 1L, 1L, 4L, 4L, 3L, 4L, 3L), .Label = c("Strong Democrat", "Not Str Democrat", "Ind,Near Dem", "Ind,Near Rep", "Not Str Republican", "Strong Republican"), class = "factor"), coninc = c(25926L, 33333L, 41667L, 69444L, 60185L, 50926L, 18519L, 3704L, 25926L, 18519L, 18519L, 18519L, 18519L, 25926L, 18519L, 33333L, 25926L, 60185L, 69444L, 50926L)), .Names = c("partyid", "coninc"), row.names = c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L), class = "data.frame")

2 个答案:

答案 0 :(得分:2)

你可以使用plyr包很容易地做到这一点(因为你的样本数据有点难读,我删除了partyid中的逗号和空格):

# creating sample data
dat <- structure(list(partyid = structure(c(3L, 2L, 2L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 5L, 1L, 2L, 1L, 1L, 4L, 4L, 3L, 4L, 3L), .Label = c("Strong Democrat", "Not Str Democrat", "Ind,Near Dem", "Ind,Near Rep", "Not Str Republican", "Strong Republican"), class = "factor"), coninc = c(25926L, 33333L, 41667L, 69444L, 60185L, 50926L, 18519L, 3704L, 25926L, 18519L, 18519L, 18519L, 18519L, 25926L, 18519L, 33333L, 25926L, 60185L, 69444L, 50926L)), .Names = c("partyid", "coninc"), row.names = c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L), class = "data.frame")

# summarising the data with plyr
require(plyr)
dat2 <- ddply(dat, .(partyid), summarise,
              zero = sum(coninc < 50001),
              fifty = sum(coninc > 50000 & coninc < 100001),
              hundred = sum(coninc > 100000 & coninc < 150001),
              hfifty = sum(coninc > 150000))

这导致以下输出:

dat2 <- structure(list(partyid = structure(1:5, .Label = c("Strong Democrat", "Not Str Democrat", "Ind,Near Dem", "Ind,Near Rep", "Not Str Republican", "Strong Republican"), class = "factor"), zero = c(6L, 3L, 2L, 2L, 1L), fifty = c(1L, 0L, 4L, 1L, 0L), hundred = c(0L, 0L, 0L, 0L, 0L), hfifty = c(0L, 0L, 0L, 0L, 0L)), .Names = c("partyid", "zero", "fifty", "hundred", "hfifty"), row.names = c(NA, -5L), class = "data.frame")

答案 1 :(得分:2)

你可以在基础R中使用cuttable

dat$cat <- cut(dat$coninc, breaks = c(0, 50000, 100000, 150000, Inf),
               labels = c("< 50K", "50K - 100K", "100K - 150K", "> 150K"))
table(dat$partyid, dat$cat)
#                     
#                      < 50K 50K - 100K 100K - 150K > 150K
#   Strong Democrat        6          1           0      0
#   Not Str Democrat       3          0           0      0
#   Ind,Near Dem           2          4           0      0
#   Ind,Near Rep           2          1           0      0
#   Not Str Republican     1          0           0      0
#   Strong Republican      0          0           0      0