Question

我需要根据一列参考值对数据集进行子集化。例如，给定一个数据集：

user_product_rating = df.pivot_table(index='review/userId', columns='product/productId', values='review/score')

我想根据col_Reference中的值过滤这些行。如果值大于0，则仅当每个值也都大于0时才保留行。相反，如果值小于0，则仅当每个值也都小于0时才保留行。允许0个不匹配我想回来：

col1 <- c(1,2,3,4)
col2 <- c(1,2,-1,4)
col3 <- c(1,2,-3,-4)
col_Reference <- c(-5,6,-7,8)
df <- cbind(col1,col2,col3,col_Reference)
df
     col1 col2 col3 col_Reference
[1,]    1    1    1            -5
[2,]    2    2    2             6
[3,]    3   -1   -3            -7
[4,]    4    4   -4             8

然后，我还想控制允许多少不匹配：最多允许1个不匹配项，我应该退回去：

     col1 col2 col3 col_Reference
[1,]    2    2    2             6

允许最大2：

     col1 col2 col3 col_Reference
[1,]    2    2    2             6
[2,]    3   -1   -3            -7

我想我应该使用col1 col2 col3 col_Reference [1,] 2 2 2 6 [2,] 3 -1 -3 -7 [3,] 4 4 -4 8，但我必须承认我不太擅长使用它：（

非常感谢

Answer 1

第一个

df[apply(df, 1, function(x) all(sign(x) == sign(tail(x, 1)))), , drop = FALSE]
#     col1 col2 col3 col_Reference
#[1,]    2    2    2             6

允许n不匹配

n = 1
df[apply(df, 1, function(x) sum(!(sign(head(x, -1)) == sign(tail(x, 1))))) <= n, , drop = FALSE]
#     col1 col2 col3 col_Reference
#[1,]    2    2    2             6
#[2,]    3   -1   -3            -7
#[3,]    4    4   -4             8

Answer 2

这不是最优雅的解决方案，但这可以解决问题！

#Create the testing dataframe
col1 <- c(1,2,3,4)
col2 <- c(1,2,-1,4)
col3 <- c(1,2,-3,-4)
col_Reference <- c(-5,6,-7,8)
df <- cbind(col1,col2,col3,col_Reference)

#Create the function to do what we want
fun <- function(df, mismatch = 0){
  df <- as.data.frame(df)
  df <- apply(df, 1, function(r){
    if(sum(sign(r[1:(ncol(df)-1)]) != sign(r['col_Reference'])) <= mismatch){
      return(r)
    }else{
      return(NULL)
    }
  })
  df <- do.call('rbind', df)
  return(df)
}

现在，调用函数！

fun(df)

        col1 col2 col3 col_Reference
[1,]    2    2    2             6

fun(df, mismatch = 1)

        col1 col2 col3 col_Reference
[1,]    2    2    2             6
[2,]    3   -1   -3            -7
[3,]    4    4   -4             8

fun(df, mismatch = 2)

        col1 col2 col3 col_Reference
[1,]    2    2    2             6
[2,]    3   -1   -3            -7
[3,]    4    4   -4             8

Answer 3

这应该有效：

# All 3 must have the same sign at the reference
df[apply(df, 1, function(x)sum(sign(x[4])*sign(x[1:3]) > 0) == 3),]
# At least 2 must have the same sign as the reference
df[apply(df, 1, function(x)sum(sign(x[4])*sign(x[1:3]) > 0) >= 2),]

检查前3列中有多少个值与参考列上的值具有相同的符号。

Answer 4

这也可以通过使用rowSums()和sign()

的简洁代码来完成

mismatch = 1
df[rowSums(sign(df)) >= (ncol(df) - mismatch * 2), ]

     col1 col2 col3 col_Reference
[1,]    1    1    1            -5
[2,]    2    2    2             6
[3,]    4    4   -4             8

根据参考列中的值是大于还是小于0来对数据帧进行子设置

4 个答案: