我正在R中进行数据转换,而我还无法过滤具有相似值的行,选择具有较高“表达式值”的行,然后按表达式级别在列中拆分数据并汇总它们。既然我知道解释不会给诺贝尔奖,所以以下是原始数据,结果以及我迄今为止取得的成就。
原始数据
df <- read.table(text =
"Tissue Species Expression
1 dentritic Human moderate
2 liver Human high
3 liver Human moderate
4 liver Human moderate
5 liver Human high
6 liver Monkey high
7 liver Monkey moderate
8 liver Dog high
9 liver Dog high
10 liver Minipig moderate
11 liver Rat low
12 liver Rat cutoff
13 liver Monkey moderate
14 lung Monkey high
15 quadriceps Monkey cutoff" , header = TRUE)
我需要达到的结果是,在“组织”和“物种”两个值都重复的情况下,仅在“表达式”上选择最高的值。
Tissue High_Expression Moderate_Expression Low_Expression cutoff
1 dentritic Human
2 liver Human, Monkey,Dog Minipig Rat
3 lung Monkey
4 quadriceps Monkey
到目前为止我所拥有的:
df$Expression <- factor(df$Expression, levels = c("cutoff", "low", "moderate", "high"), ordered = TRUE)
df$Species <- as.character(df$Species)
df <- df %>%
mutate(High_expressed = ifelse(Expression == "high", Species, "")) %>%
mutate(moderate_expressed = ifelse(Expression == "moderate", Species, "")) %>%
mutate(low_expressed = ifelse(Expression == "low", Species, "")) %>%
mutate(below_cutoff_expressed = ifelse(Expression == "cutoff", Species, "")) %>%
select(-c("Expression", "Species"))
df <- aggregate(. ~ groupTissue, data = df, paste, collapse = ",")
That gives:
Tissue High_Expression Moderate_Expression Low_Expression cutoff
1 dentritic Human
2 liver Human,,,Human, ,Human,Human,,, ,,,,,,,,,Rat,, ,,,,,,,,,Rat,
Monkey,,Dog,Dog,,,, Monkey,,,Minipig,,,Monkey
3 lung Monkey
4 quadriceps Monkey
预先感谢
答案 0 :(得分:0)
您可以首先根据它们的Expression
值排列数据,仅在Tissue
和Species
中选择较高的值,然后以宽格式获取数据。
library(dplyr)
df %>%
arrange(match(Expression, c('high', 'moderate', 'low', 'cutoff'))) %>%
distinct(Tissue, Species, .keep_all = TRUE) %>%
pivot_wider(names_from = Expression,values_from = Species,values_fn = toString) %>%
arrange(Tissue)
# Tissue high moderate low cutoff
# <chr> <chr> <chr> <chr> <chr>
#1 dentritic NA Human NA NA
#2 liver Human, Monkey, Dog Minipig Rat NA
#3 lung Monkey NA NA NA
#4 quadriceps NA NA NA Monkey