Question

我有一个数据集df，其中列由各种化学物质组成，行由根据其id和每种化学物质的浓度标识的样品组成。

我需要使用每种化学药品的唯一值来校正化学药品浓度，这些值可以在另一个数据集df2中找到。

这是一个最小的df1数据集：

df1 <- read.table(text="id,chem1,chem2,chem3,chemA,chemB
1,0.5,1,5,4,3
2,1.5,0.5,2,3,4
3,1,1,2.5,7,1
4,2,5,3,1,7
5,3,4,2.3,0.7,2.3",
header = TRUE,
sep=",")

这是一个df2示例：

df2 <- read.table(text="chem,value
chem1,1.7
chem2,2.3
chem3,4.1
chemA,5.2
chemB,2.7",
header = TRUE,
sep = ",")

我需要做的是将chem1中df1的所有观察结果除以chem1中为df2提供的值，并对每种化学物质重复此操作。实际上，化学品名称不是连续的，大约有30种化学品。

以前我会使用Excel和索引/匹配来完成此操作，但是我希望使我的方法更具可重复性，因此我在使用R时遇到了麻烦。我主要使用dplyr进行数据操作，所以如果有一个tidyverse解决方案在那里，那太好了！

感谢您的帮助

Answer 1

我们可以使用'df2'中的'chem'列来子集'df1'，除以复制的'df2'的'value'列以使长度相同，并通过分配'df1'列来更新结果返回

df1[as.character(df2$chem)] <- df1[as.character(df2$chem)]/df2$value[col(df1[-1])]

Answer 2

使用reshape2软件包，可以将数据帧更改为长格式以与df2合并，如下所示。（请注意，示例df引入了一些在此解决方案中经过过滤的空白）

library(reshape2)
df1 <- read.table(text="id,chem1,chem2,chem3,chemA,chemB
1,0.5,1,5,4,3
                  2,1.5,0.5,2,3,4
                  3,1,1,2.5,7,1
                  4,2,5,3,1,7
                  5,3,4,2.3,0.7,2.3",
                  header = TRUE,
                  sep=",",stringsAsFactors = F)

df2 <- read.table(text="chem,value
chem1,1.7
                  chem2,2.3
                  chem3,4.1
                  chemA,5.2
                  chemB,2.7",
                  header = TRUE,
                  sep = ",",stringsAsFactors = F)

df2$chem <- gsub("\\s+","",df2$chem) #example introduces whitespaces in the names
df1A <- melt(df1,id.vars=c("id"),variable.name="chem")

combined <- merge(x=df1A,y=df2,by="chem",all.x=T)
combined$div <- combined$value.x/combined$value.y
head(combined)

   chem id value.x value.y       div
1 chem1  1     0.5     1.7 0.2941176
2 chem1  2     1.5     1.7 0.8823529
3 chem1  3     1.0     1.7 0.5882353
4 chem1  4     2.0     1.7 1.1764706
5 chem1  5     3.0     1.7 1.7647059
6 chem2  1     1.0     2.3 0.4347826

或宽格式：

> dcast(combined[,c("id","chem","div")],id ~ chem,value.var="div")
  id     chem1     chem2     chem3     chemA     chemB
1  1 0.2941176 0.4347826 1.2195122 0.7692308 1.1111111
2  2 0.8823529 0.2173913 0.4878049 0.5769231 1.4814815
3  3 0.5882353 0.4347826 0.6097561 1.3461538 0.3703704
4  4 1.1764706 2.1739130 0.7317073 0.1923077 2.5925926
5  5 1.7647059 1.7391304 0.5609756 0.1346154 0.8518519

Answer 3

这是一个tidyverse解决方案。

df3 <- df1 %>%
    # convert the data from wide to long to make the next step easier
    gather(key = chem, value = value, -id) %>%
    # do your math, using 'match' to map values from df2 to rows in df3
    mutate(value = value/df2$value[match(df3$chem, df2$chem)]) %>%
    # return the data to wide format if that's how you prefer to store it
    spread(chem, value)

使用来自另一个数据框的值来操作变量值

3 个答案: