我试图在我的数据框架中计算R中的Blau多样性指数(gini-simpson)。我为一个组中的每个人提供了6列,其值为"学生","教师","校友" "不适用"。如果组小于6,则列中也有NA。
我想计算不在每列中的行间Blau索引(整个组的多样性),na.rm = TRUE。
有人知道如何在R中执行此操作吗?
非常感谢!
答案 0 :(得分:0)
我们可以很容易地手工计算Gini-Simpson指数。
首先,我将生成一些样本数据:
# Generate sample data
set.seed(2017);
type <- c("Student", "Faculty", "Alumni");
data <- sample(type, 6 * 20, replace = TRUE);
# Replace 40 entries with NAs
set.seed(2017);
data[sample(6 * 20, 40)] <- NA;
# Reformat as 6 column dataframe
df <- as.data.frame(matrix(data, ncol = 6), stringsAsFactors = FALSE);
names(df) <- paste0("e", seq(1:6), "_affiliation");
head(df);
#e1_affiliation e2_affiliation e3_affiliation e4_affiliation e5_affiliation
#1 <NA> Faculty <NA> Student Student
#2 <NA> <NA> <NA> Faculty Alumni
#3 <NA> Alumni Student Faculty Faculty
#4 Student <NA> <NA> <NA> <NA>
#5 <NA> Student Alumni Alumni Student
#6 Alumni Alumni Faculty Faculty Student
# e6_affiliation
#1 Alumni
#2 Alumni
#3 <NA>
#4 Student
#5 Faculty
#6 Student
Gini-Simpson (= Gibbs-Martin = Blau) index of diversity由
提供我们定义一个带有字符串向量并返回GS索引的函数:
# Define function to calculate the Gini-Simpson index
# We ensure the same levels (present or absent) of x
# by factor(x, levels = type)
# Note that NAs will not be considered by default
get.GS.index <- function(x, type) {
x <- factor(x, levels = type);
return(1 - sum(prop.table(table(x))^2));
}
我们现在可以将get.GS.index
应用于数据框的所有行
apply(df, 1, get.GS.index, type)
#[1] 0.6250000 0.4444444 0.6250000 0.0000000 0.6400000 0.6666667 0.5000000
#[8] 0.6250000 0.6400000 0.5000000 0.4444444 0.6400000 0.3750000 0.3750000
#[15] 0.0000000 0.0000000 0.6111111 0.4444444 0.6666667 0.6400000
如果组中只有一种类型,我们可以修改函数get.GS.index
以返回NA
。
get.GS.index <- function(x, type) {
x <- factor(x, levels = type);
t <- table(x);
if (length(t[t>0]) == 1) return(NA) else return(1 - sum(prop.table(t)^2));
}
apply(df, 1, get.GS.index, type);
# [1] 0.6250000 0.4444444 0.6250000 NA 0.6400000 0.6666667 0.5000000
# [8] 0.6250000 0.6400000 0.5000000 0.4444444 0.6400000 0.3750000 0.3750000
#[15] NA NA 0.6111111 0.4444444 0.6666667 0.6400000