Question

输入

final_table =
  Chr     start       end   num seg.mean seg.mean.1 seg.mean.2
    1  68580000  68640000 A8430   0.7000     0.1440     0.1032
    1 115900000 116260000 B8430   0.0039     2.7202     2.7202
    1 173500000 173680000    C5  -1.7738    -0.0746    -0.2722

如何创建一个新的data.frame，其中第5列到第7列的值设置为：

-1，如果值＆lt; -0.679

0，如果-0.679＆lt; = value＆lt; = 0.450

+1，如果值＆gt; 0.450

预期输出

Chr     start       end   num seg.mean seg.mean.1 seg.mean.2
  1  68580000  68640000 A8430        1          0          0
  1 115900000 116260000 B8430        0          1          1
  1 173500000 173680000    C5       -1          0          0

Answer 1

试试这个：

# read the data in
df <- read.table(header = TRUE, text="Chr     start       end        num    seg.mean    seg.mean.1   seg.mean.2
1   68580000    68640000    A8430    0.7000      0.1440     0.1032 
1   115900000   116260000   B8430    0.0039      2.7202     2.7202
1   173500000   173680000   C5      -1.7738      -0.0746    -0.2722")

# get the column-names of the columns you wanna change
cols <- names(df[5:length(df)])
# set a function for the different values you want for the value-ranges
fun_cond <- function(x) {
    ifelse(x < -0.679 , -1, ifelse(
    x >= -0.679 & x <= 0.450, 0, 1))
}
# copy the data-frame so the old one doesnt get overwritten
new_df <- df

# work with data-table to apply the function to the columns
library(data.table)
setDT(new_df)[ , (cols) := lapply(.SD, fun_cond), .SDcols = cols]

输出：

   Chr     start       end   num seg.mean seg.mean.1 seg.mean.2
1:   1  68580000  68640000 A8430        1          0          0
2:   1 115900000 116260000 B8430        0          1          1
3:   1 173500000 173680000    C5       -1          0          0

同样的事情，不使用任何额外的包：

cols <- names(df[5:length(df)])
fun_cond <- function(x) {
    ifelse(x < -0.679 , -1, ifelse(
        x >= -0.679 & x <= 0.450, 0, 1))
}

new_df <- df
new_df[5:length(df)] <- lapply(new_df[5:length(df)], fun_cond)

Answer 2

我会使用剪切功能并将其分别应用于最后三列。这是一个简单的例子：

original = data.frame(a=c(rep("A", 2), rep("B", 2)), seg.mean=c(-1, 0, 0.4, 0.5));
original$segmented = cut(original$seg.mean, c(-Inf, -0.679, 0.450, Inf), labels = c(-1,0,1))

有一点需要注意：新栏目将是一个因素。如果您需要数值，则可能需要将as.numeric应用于它。

你也可以尝试使用labels = FALSE，它会给你数值（但可能是1,2,3而不是-1,0,1）。您可以通过减去2来解决这个问题：

original$segmented = cut(original$seg.mean, c(-Inf, -0.679, 0.450, Inf), labels = FALSE)-2

Answer 3

您可以通过子集化

直接替换数据框中的字段

df[, 5:7] <- ifelse(df[, 5:7] < -0.679, -1,
             ifelse(df[, 5:7] < 0.450, 0,
             1))

创建满足某些条件的data.fame

输入

预期输出

3 个答案: