我有这样的数据集:
term occ value
Less Than 1 year Yale 1
Less Than 1 year MIT 3
1 Year Yale 2
2 Years Yale 3
2 Years Yale 8
2 years CMU 2
3 Years Yale 5
3 years NYU 2
Greater than 3 Years NYU 5
Greater Than 3 Years CALTEC 4
No Fixed Term Yale 2
Other Bu 9
我想要一个表格显示按术语计算的记录数量。表格应按Term的顺序排列。
注意:“年”和“年”,“比”和“比”之间的差异。
输出如下:
term count
Less Than 1 year 2
1 Year 1
2 Years 3
3 Years 2
Greater than 3 Years 2
No Fixed Term 1
Other 1
答案 0 :(得分:2)
如果您需要特殊订单,则需要指定因子中的级别顺序。您还需要在不考虑案例的情况下进行比较。这应该工作
# reproducible data
dd<-read.table(text="term,occ,value
Less Than 1 year,Yale,1
Less Than 1 year,MIT,3
1 Year,Yale,2
2 Years,Yale,3
2 Years,Yale,8
2 years,CMU,2
3 Years,Yale,5
3 years,NYU,2
Greater than 3 Years,NYU,5
Greater Than 3 Years,CALTEC,4
No Fixed Term,Yale,2
Other,Bu,9", header=T, sep=",")
# specify custom order
termorder<-c("Less Than 1 year","1 Year","2 Years","3 Years",
"Greater than 3 Years","No Fixed Term","Other")
#tabulate
tt <- table(factor(tolower(dd$term), levels=tolower(termorder), labels=termorder))
返回命名向量。如果你想要一个data.frame,你可以做
as.data.frame(tt)
# Var1 Freq
# 1 Less Than 1 year 2
# 2 1 Year 1
# 3 2 Years 3
# 4 3 Years 2
# 5 Greater than 3 Years 2
# 6 No Fixed Term 1
# 7 Other 1
答案 1 :(得分:1)
我们可以在将“字词”转换为所有table
或lower
案例后使用upper
as.data.frame(table(tolower(df1$term)))
如果我们需要自定义订单,那么在执行factor
levels
并指定table
或者我们也可以用tolower
sub
。
v1 <- sub("Than", "than", sub("years", "Years", df1$term))
as.data.frame(table(factor(v1, levels = unique(v1))))
# Var1 Freq
#1 Less than 1 year 2
#2 1 Year 1
#3 2 Years 3
#4 3 Years 2
#5 Greater than 3 Years 2
#6 No Fixed Term 1
#7 Other 1