我正在处理RNA seq数据,其中第一列为细胞簇,第一行为基因表达。我想比较特定列的行1和2、3、4、5和6等,以查看是否有两个折叠级别的表达。具有两倍表达水平或高表达水平的任何东西都将被保留,而更少的东西将被滤除。我想查看基因表达相对倍数变化的数据。
我尝试运行此代码,但仍然出现错误
GeneName Cluster_1 Cluster_1 Cluster_2 Cluster_2 Cluster_3 Cluster_3 Itga9 0.019 0.004 0.028 0.020 0.053 0.045 Itga1 0.018 0.012 0.016 0.011 0.016 0.030 Npnt 0.000 0.000 0.000 0.000 0.000 0.000 Agrn 0.014 0.012 0.019 0.014 0.012 0.015 Cd36 0.028 0.107 0.035 0.037 0.030 0.074 Cd44 0.063 0.132 0.105 0.112 0.143 0.186 Chad 0.000 0.000 0.000 0.000 0.000 0.000 My_Data <- My_Data[2:7,2:7] My_Data <- t(My_Data) foo = function(x) { if (length(x) %% 2 == 1) { stop("Odd number of rows!") } odd = seq(1, length(x), by = 2) even = odd + 1 ratio = x[odd] / x[even] return(any(ratio >= 2 | ratio <= 0.5)) } FilteredDf <- Filter(foo, My_Data)
由于某种原因,它会产生错误: FUN(X [[i]],...)中的错误:奇数行!
答案 0 :(得分:1)
写一个函数在单个列上进行测试并返回TRUE
或FALSE
,然后使用Filter
将其应用于数据框中的每一列,并仅保留{ {1}} s:
TRUE
我将由您自行决定修改功能(例如,如果输入的行数为奇数)。
使用此数据:
foo = function(x) {
if (length(x) %% 2 == 1) {
stop("Odd number of rows!")
}
odd = seq(1, length(x), by = 2)
even = odd + 1
ratio = x[odd] / x[even]
return(any(ratio >= 2 | ratio <= 0.5))
}
Filter(foo, my_df)
# a
# 1 0.000
# 2 0.000
# 3 4.020
# 4 2.004
# 5 1.001
# 6 0.004
答案 1 :(得分:0)
require(dplyr)
my_df <- read.table(text ='a b c
0.000 0.001 0.883
0.000 0.001 1.471
0.000 0.003 1.357
10.004 0.004 1.618
3.001 0.005 1.110
0.004 0.006 1.048', header = TRUE)
## in the first step you will create a (duplicate) column tha shifts the rows upwards by 1
# the purpose is that you can apply functions horizontally
my_df %>%
mutate(lead_a = lead(a)) %>%
select(a, lead_a, b, c) %>%
head
# a lead_a b c
# 1 0.000 0.000 0.001 0.883
# 2 0.000 0.000 0.001 1.471
# 3 0.000 10.004 0.003 1.357
# 4 10.004 3.001 0.004 1.618
# 5 3.001 0.004 0.005 1.110
# 6 0.004 NA 0.006 1.048
# as you can see row 1 in lead_a is the same as row 2 in a !
# now you can compare row 1 to row2 3 to 4 etc...