如何使用我在数据框中的值上创建的函数并将值替换为函数的结果?

时间:2018-02-25 16:34:55

标签: r function dataframe rstudio

我创建了一个名为getExpressionLevel的函数,问题要求我使用此函数将数字替换为下面的语句。那么我需要用什么来实现这个目标呢?

getExpressionLevel的函数;

function(a)    {
  if    (a<5)    {
  cat    ("none")
  }

 if    (a>=5&a<20)    {
  cat    ("low")
 }

 if    (a>=20&a<60)    {
  cat    ("medium")
 }

  if    (a>=60)    {
  cat    ("high")
  }
}
  • &#34;无&#34;表达水平低于5
  • &#34;低&#34;表达水平高于或等于5且低于20
  • &#34;介质&#34;表达水平高于或等于20且低于60
  • &#34;高&#34;表达水平高于或等于60

问题是;

创建名为data.frame的{​​{1}},其中包含10行(每个基因一行)和3列 (每个细胞系一个)。然后计算每个细胞系中每个基因的平均表达并使用 expression_levels函数相应地标记表达式。

这是我当前的data.frame。其中的数据需要替换为getExpression函数的结果。

getExpressionLevel

这是预期的data.frame:

  genename       Kc167         BG3         S2

1   Clic        7.333333      48.33333      75.00000

2   Treh        24.666667     12.66667      52.33333

3   bib         31.333333      79.33333     82.00000

4   CalpC       65.000000     69.33333      63.66667

5   tud         59.666667     81.66667      16.33333

6   cort        74.333333     50.66667      28.66667

7   S2P         72.000000     39.66667      50.66667

8   Mitofilin   38.333333     29.00000      54.66667

9   Oxp         73.666667     49.33333      42.66667

10  Ada1-2      87.333333     42.00000      28.00000

2 个答案:

答案 0 :(得分:1)

希望这有帮助!

bin_breaks <- c(-Inf, 5, 20, 60, Inf)
bin_labels <- c("none", "low", "medium", "high")
df[,-1] <- sapply(df[,-1], function(x) cut(x, 
                                           breaks = bin_breaks, 
                                           labels = bin_labels, 
                                           right = F))
df

输出是:

    genename  Kc167    BG3     S2
1       Clic    low medium   high
2       Treh medium    low medium
3        bib medium   high   high
4      CalpC   high   high   high
5        tud medium   high    low
6       cort   high medium medium
7        S2P   high medium medium
8  Mitofilin medium medium medium
9        Oxp   high medium medium
10    Ada1-2   high medium medium

示例数据:

df <- structure(list(genename = c("Clic", "Treh", "bib", "CalpC", "tud", 
"cort", "S2P", "Mitofilin", "Oxp", "Ada1-2"), Kc167 = c(7.333333, 
24.666667, 31.333333, 65, 59.666667, 74.333333, 72, 38.333333, 
73.666667, 87.333333), BG3 = c(48.33333, 12.66667, 79.33333, 
69.33333, 81.66667, 50.66667, 39.66667, 29, 49.33333, 42), S2 = c(75, 
52.33333, 82, 63.66667, 16.33333, 28.66667, 50.66667, 54.66667, 
42.66667, 28)), .Names = c("genename", "Kc167", "BG3", "S2"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))


编辑:在代码中添加适当的right参数以满足边界条件&amp; OP的要求(礼貌@drf)。

答案 1 :(得分:0)

功能方式。知道如何使用函数总是有帮助的。

## sample data
df <- data.table(genename = c('Clic','Treh','bib','CalpC'),
                 Kc167 = c(7.333,24.666,31.3333,65),
                 BG3 = c(48.33,12.66,79.33,69.33),
                 S2 = c(75.00,52.33,82.00,63.66))

## this function updates values based on following criterias
get_values <- function(x)
{
    if(x < 5) return ('None')
    else if ((x >= 5) && (x < 20)) return ('low')
    else if ((x >= 20) && (x < 60)) return ('medium')
    else if (x >= 60) return ('high')
}

## creating a new data frame with answers
df2 <- df$genename
df2$Kc167 <- sapply(df$Kc167, get_values)
df2$BG3 <- sapply(df$BG3, get_values)
df2$S2 <- sapply(df$S2, get_values)

  genename  Kc167    BG3     S2
1:     Clic    low medium   high
2:     Treh medium    low medium
3:      bib medium   high   high
4:    CalpC   high   high   high