我的数据框主要包含catagorical列和一个数字列,df看起来像这样(简化):
**Home_type** **Garden_type** **NaighbourhoOd** **Rent**
Vila big brooklyn 5000
Vila small bronx 7000
Condo shared Sillicon valley 2000
Appartment none brooklyn 500
Condo none bronx 1700
Appartment none Sillicon Valley 800
对于每个catagorical列,我想显示与其相关的所有不同值,频率和租金总和。
结果应如下所示:
**Variable** **Distinct_values** **No_of-Occurences** **SUM_RENT**
Home_type Vila 2 12000
Home_type Condo 2 3700
Home_type Appartment 2 1300
Garden_type big 1 5000
Garden_type small 1 7000
Garden_type shared 1 2000
Garden_type none 3 3000
Naighbourhood brooklyn 2 5500
Naighbourhood Bronx 2 8700
Naighbourhood Sillicon Valley 2 2800
我是R的新手,并试图在reshape2中使用融合,但没有取得多大成功,任何帮助都将非常感激。
答案 0 :(得分:2)
我倾向于最近选择tidyr
到reshape2
,但这主要是因为语法与dplyr
更相似 - 这将使这项任务变得更加容易,因为加载magrittr
管道(%>%
)及其数据摘要工具。
首先,我们gather
(从tidyr
)将所有非租用列转换为长格式(仅运行这两行以查看结果)。然后group_by
要聚合在一起的列。最后,每个组中都有summarise
来获取您想要的指标。
df %>%
gather(Variable, Distinct_Values, -Rent) %>%
group_by(Variable, Distinct_Values) %>%
summarise(
`No_of-Occurences` = n()
, SUM_RENT = sum(Rent)
)
给出:
Variable Distinct_Values `No_of-Occurences` SUM_RENT
<chr> <chr> <int> <int>
1 Garden_type big 1 5000
2 Garden_type none 3 3000
3 Garden_type shared 1 2000
4 Garden_type small 1 7000
5 Home_type Appartment 2 1300
6 Home_type Condo 2 3700
7 Home_type Vila 2 12000
8 NaighbourhoOd bronx 2 8700
9 NaighbourhoOd brooklyn 2 5500
10 NaighbourhoOd Sillicon valley 1 2000
11 NaighbourhoOd Sillicon Valley 1 800
(注意,你的数据有“V”和“v”代表“硅谷”,导致两条不同的行。)
答案 1 :(得分:1)
我们可以使用data.table
。将'data.frame'转换为'data.table'(setDT(df1)
),将melt
从'wide'格式转换为'long'格式,按'变量','值'分组(从melt
),我们创建两列'No_of_occur','SUM_RENT'作为'Rent'列的行数(.N
)和sum
,然后按'变量'分组, 'No_of_occur'和'SUM_RENT',获取'value'列的unique
元素('Distinct_values')
library(data.table)
melt(setDT(df1), id.var=c('Rent'))[, c("No_of_occur", "SUM_RENT") :=
.(.N, sum(Rent)) ,.(variable, value)][,
.(Distinct_values = unique(value)) , .(variable, No_of_occur, SUM_RENT)]
# variable No_of_occur SUM_RENT Distinct_values
#1: Home_type 2 12000 Vila
#2: Home_type 2 3700 Condo
#3: Home_type 2 1300 Appartment
#4: Garden_type 1 5000 big
#5: Garden_type 1 7000 small
#6: Garden_type 1 2000 shared
#7: Garden_type 3 3000 none
#8: NaighbourhoOd 2 5500 brooklyn
#9: NaighbourhoOd 2 8700 bronx
#10:NaighbourhoOd 2 2800 Sillicon Valley