计算R(每行)中一组变量中值的出现次数 - 使用权重

时间:2018-03-05 22:06:25

标签: r dataframe vector apply

我有以下df8数据帧:

df8=data.frame(V1=c(10,20,10,20),V2=c(20,30,20,30),V3=c(20,10,20,10))

以下是每行的值出现次数:

a<-apply(df8,MARGIN=1,table)

> a
[[1]]

10 20 
 1  2 

[[2]]

10 20 30 
 1  1  1 

[[3]]

10 20 
 1  2 

[[4]]

10 20 30 
 1  1  1 

我有一个矢量 - V = (0.25,0.25,0.5) 这意味着我希望每行的每个行的每个出现次数乘以向量V: 我想得到这样的东西用于计算(总结每个不同行值的列的权重):     [[1]]

   10  20 
 0.25  0.5

[[2]]

   10   20  30 
 0.5 0.25 0.25 

[[3]]

 10     20 
 0.25  0.5

[[4]]

 10   20   30 
 0.5 0.25 0.25 

现在我想为每一行选择a*V值最高的项目:

> df8
  V1 V2 V3 max_val
1 10 20 20   20
2 20 30 10   10
3 10 20 20   20
4 20 30 10   10

1 个答案:

答案 0 :(得分:1)

一个选项可以是将table函数应用于每一行,并找出每列中值的出现次数。然后,V中定义的因子将应用于每列,以查找具有最大freq*V值的列的索引。该行值index的值将是所需的值。

#Multiplier for occurrence in each column
V = c(0.25,0.25,0.5)

#data frame
df8=data.frame(V1=c(10,20,10,20),V2=c(20,30,20,30),V3=c(20,10,20,10))

# This function accepts all columns for a row. Finds frequencies for each
# column values and then multiply with V (column wise)
# Finally value in row at index with max(freq*V) is returned.

find_max_freq_val <- function(x){
  freq_df <- as.data.frame(table(x))
  freq_vec <- mapply(function(y)freq_df[freq_df$x==y,"Freq"], x)
  #multiply with V with freq and find index of max(a*V)
  #Then return item at that index from x
  x[which((freq_vec*V) == max(freq_vec*V))]

}

# call above function to add an column with desired value
df8$new_val <- apply(df8, 1, find_max_freq_val)

df8
#  V1 V2 V3 new_val
#1 10 20 20      20
#2 20 30 10      10
#3 10 20 20      20
#4 20 30 10      10