Question

我正在处理RNA seq数据，其中第一列为细胞簇，第一行为基因表达。我想比较特定列的行1和2、3、4、5和6等，以查看是否有两个折叠级别的表达。具有两倍表达水平或高表达水平的任何东西都将被保留，而更少的东西将被滤除。我想查看基因表达相对倍数变化的数据。

我尝试运行此代码，但仍然出现错误

GeneName    Cluster_1    Cluster_1   Cluster_2  Cluster_2 Cluster_3   Cluster_3
Itga9       0.019        0.004       0.028        0.020      0.053      0.045
Itga1       0.018        0.012       0.016        0.011      0.016       0.030
Npnt        0.000        0.000       0.000        0.000      0.000       0.000
Agrn        0.014        0.012       0.019        0.014      0.012       0.015
Cd36        0.028        0.107       0.035        0.037      0.030       0.074
Cd44        0.063        0.132       0.105        0.112      0.143       0.186
Chad        0.000        0.000       0.000        0.000      0.000       0.000        


My_Data <- My_Data[2:7,2:7] 
My_Data <- t(My_Data) foo = function(x) { 
if (length(x) %% 2 == 1) {
stop("Odd number of rows!")  
}  
 odd = seq(1, length(x), by = 2) 
 even = odd + 1   
 ratio = x[odd] / x[even]  
 return(any(ratio >= 2 |
 ratio <= 0.5)) } FilteredDf <- Filter(foo, My_Data)

由于某种原因，它会产生错误： FUN（X [[i]]，...）中的错误：奇数行！

Answer 1

写一个函数在单个列上进行测试并返回TRUE或FALSE，然后使用Filter将其应用于数据框中的每一列，并仅保留{ {1}} s：

TRUE

我将由您自行决定修改功能（例如，如果输入的行数为奇数）。

使用此数据：

foo = function(x) {
  if (length(x) %% 2 == 1) {
    stop("Odd number of rows!")
  }
  odd = seq(1, length(x), by = 2)
  even = odd + 1
  ratio = x[odd] / x[even]
  return(any(ratio >= 2 | ratio <= 0.5))
}
Filter(foo, my_df)
#       a
# 1 0.000
# 2 0.000
# 3 4.020
# 4 2.004
# 5 1.001
# 6 0.004

Answer 2

require(dplyr)


my_df <- read.table(text ='a      b     c
         0.000  0.001 0.883
         0.000  0.001 1.471
         0.000  0.003 1.357
         10.004  0.004 1.618
         3.001  0.005 1.110
         0.004  0.006 1.048', header = TRUE)


## in the first step you will create a (duplicate) column tha shifts the rows upwards by 1

# the purpose is that you can apply functions horizontally 



my_df %>% 
  mutate(lead_a = lead(a)) %>%
  select(a, lead_a, b, c) %>%  
  head 


#       a lead_a     b     c
# 1  0.000  0.000 0.001 0.883
# 2  0.000  0.000 0.001 1.471
# 3  0.000 10.004 0.003 1.357
# 4 10.004  3.001 0.004 1.618
# 5  3.001  0.004 0.005 1.110
# 6  0.004     NA 0.006 1.048



# as you can see row 1 in lead_a is the same as row 2 in a !
# now you can compare row 1 to row2 3 to 4 etc...

如何针对R的数据矩阵中的特定行进行过滤，我的过滤条件有点复杂

2 个答案: