Question

我想选择数据框中信息数量最多的行。此数据框是自动生成的，因此列的名称会随着时间的推移而增加。

数据就像。

Player  V1  F1  V2  F2  V3  F3  V4  F4
111111  0   0   1   3   0   0   1   3
111111  0   0   1   3   1   3   1   3
222222  3   4   0   0   3   4   3   4
222222  3   4   3   4   3   4   3   4
33333   1   2   1   2   1   2   1   2
33333   1   2   1   2   1   2   0   0

应该是：

Player  V1  F1  V2  F2  V3  F3  V4  F4
111111  0   0   1   3   1   3   1   3
222222  3   4   3   4   3   4   3   4
33333   1   2   1   2   1   2   1   2

的想法是选择具有最完整信息的行。我认为0是不完整的信息

Answer 1

您提到数据框是自动生成的，因此列的名称会随着时间的推移而增加。是你想要的实时分组吗？

下面的data.table方法应该可以相应地对Player列进行分组并选择最大值。它适用于您给出的代表性示例。这类似于@arun在这里提供的答案。 Group by one column, select row with minimum in one column for every pair of columns in R

require (data.table)
dt <- as.data.table(df)
# Get the column names
my_cols <- c("V1","F1","V2","F2","V3","F3","V4","F4")  

# Map applies function and subset across all the columns passed
# as vector my_cols, and mget return value of the named object

# data.table expression written in general form for understanding DT[i, j, by]
# missing i implies "on all rows".
# this expression computes the expression in 'j' grouped by 'Player'
dt[, Map(`[`, mget(my_cols), lapply(mget(my_cols), which.max)), by = Player]
#    Player V1 F1 V2 F2 V3 F3 V4 F4
# 1: 111111  0  0  1  3  1  3  1  3
# 2: 222222  3  4  3  4  3  4  3  4
# 3:  33333  1  2  1  2  1  2  1  2

Answer 2

正如@Imo和@ evan058已经指出的那样，它不清楚什么是最完整的信息＆＃34;手段。我假设你认为0缺少信息，因此＆＃34;最完整的＆＃34;指每个玩家的0条目最少的条目：

这段代码应该可以完成这项任务：

library(plyr)
newData <- ldply(unique(data$Player), function(player) {
  tmp <- data[data$Player == player,]
  tmp[which.max(rowSums(tmp[,-1] != 0)),]
})
print(newData)

使用R选择数据框中具有最多可用信息的行

2 个答案: