我有一些来自质谱仪的数据。我想根据其他列的内容找到一些值的斜率。
数据如下:
import requests
class OlympicAthlete:
def __init__(self, name):
self.url = f'https://www.olympic.org/{name}'
@property
def contents(self):
r = requests.get(self.url)
if r.status_code != 200:
print(f'Your get request was unsuccessful with a {r.status_code} error code')
return None
return r.text
phelps = OlympicAthlete('michael-phelps')
print(phelps.contents)
我想为每种蛋白质A和B查找2个“比率”值的斜率(实际上,这是针对整个数据框中每个唯一的df $ Protein值)。
我要查找的第一个值是来自通道127N和128N的数据的比率之间的斜率。
我要查找的第二个值是来自通道127C和128C的数据的比率之间的斜率。
N和C是两个不同的实验条件。我认为我必须先进行某种分组,以便只有N与N放在一起,而C与C放在一起。有什么想法吗?
我最终将得到如下数据:
Protein | Channel | Ratio
A | 127N | 0.5
A | 128N | 0.7
A | 127C | 0.9
A | 128C | 0.4
B | 127N | 0.2
B | 128N | 0.5
B | 127C | 0.7
B | 128C | 0.3
每种蛋白质有2个斜率,每种条件1个。同样,通道的顺序必须始终相同。我可以将“通道名称”更改为1、2、3、4,这样有助于简化坡度。
答案 0 :(得分:0)
您可以使用一些dplyr
函数
df <- data.frame(Protein=as.character(c("A","A","A","A")),
Channel=as.character(c("127N", "128N", "127C", "128C")),
Ratio=as.numeric(c(0.5, 0.7, 0.9, 0.4)),
stringsAsFactors = F)
library(dplyr)
# Create condition column
df$Condition <- substr(df$Channel, 4, 4)
df %>%
select(Protein, Condition, Ratio) %>%
group_by(Protein, Condition) %>%
mutate(Slope = Ratio - lag(Ratio, default = first(Ratio))) %>% # This produces the Slope
mutate(Slope = round(Slope,1)) %>% # This rounds to one decimal place - change if necessary
select(Protein, Condition, Slope) %>%
{ .[seq(2, nrow(.), 2), ] }# This selects every second row
输出看起来像这样
# A tibble: 2 x 3
# Groups: Protein, Condition [2]
Protein Condition Slope
<chr> <chr> <dbl>
1 A N 0.2
2 A C -0.5
etc
答案 1 :(得分:0)
我正在为此使用tidyverse
library(tidyverse)
单独的频道和条件
df$Condition <- str_sub(df$Channel, -1)
df$Channel <- str_sub(df$Channel, 1, -2)
然后
df %>%
select(Protein, Channel, Condition, Ratio) %>%
group_by(Protein, Condition) %>%
arrange(Protein, desc(Condition), Channel) %>%
# For each row, subtract the previous value
# This will produce NA for the first row of each group
mutate(Slope = round(Ratio - lag(Ratio), 1)) %>%
# Remove the rows with NA
drop_na() %>%
# Select just the desired columns
select(Protein, Condition, Slope)
我添加了一个额外的数据点(A,129N,0.9),以便一个小组有3个时间点,以确保它可以在2个以上的时间点工作。
结果:
# A tibble: 5 x 3
# Groups: Protein, Condition [4]
Protein Condition Slope
<fct> <chr> <dbl>
1 A N 0.2
2 A N 0.2
3 A C -0.5
4 B N 0.3
5 B C -0.4