基于R个数据帧之间匹配条件的操作

时间:2018-10-18 13:08:11

标签: r dataframe match conditional-statements

我有以下数据帧:

head(RH)
  160143 161143 161144 161145 161146 162145 162146 162147 163146 163147
1   24.9   26.4   27.4   28.5   30.4   29.2   32.6   58.7   50.6   62.1
2   10.6   29.4   29.3   29.5   30.3   29.7   33.0   68.2   53.2   82.3
3   17.7   30.7   30.7   31.7   31.5   29.4   34.1   65.0   48.0   78.5
4   39.2   38.6   41.0   37.5   29.0   31.1   36.4   56.4   89.7   83.9
5   23.1   23.0   27.9   29.9   38.2   29.6   41.4   88.2   86.0   91.2
6   27.7   28.1   38.5   40.7   50.8   43.3   56.7  106.6   72.5   94.2


head(percentage)
      xy     perc
1 160143 50.22337
2 161143 29.69779
3 107167 41.98815
4 107168 66.68095
5 107169 37.67827
6 108167 29.69238

我想将RH列乘以perc列的值,当RH的列名与perc的xy列匹配时(即列160843应该全部乘以50.22337,列161143应该乘以29.69779,依此类推) ...(在该示例中,不再有匹配项,但是百分比数据框的xy列包含RH列名称中的所有可能值)。

结果应该是与RH尺寸相同的数据框。

3 个答案:

答案 0 :(得分:1)

您可以提取现有列的比例因子:

foo <- percentage$perc[match(colnames(RH), percentage$xy)]
# [1] 50.22337 29.69779       NA       NA       NA       NA       NA       NA       NA       NA

并在1所在的位置插入NA(即其他列将乘以1):

t(t(RH) * ifelse(is.na(foo), 1, foo))

答案 1 :(得分:1)

我使用的数据与您发布的数据类似:

RH = structure(list(`160143` = c(24.9, 10.6, 17.7, 39.2, 23.1, 27.7), 
                    `161143` = c(26.4, 29.4, 30.7, 38.6, 23, 28.1), 
                    `161144` = c(27.4, 29.3, 30.7, 41, 27.9, 38.5), 
                    `161145` = c(28.5, 29.5, 31.7, 37.5, 29.9, 40.7), 
                    `161146` = c(30.4, 30.3, 31.5, 29, 38.2, 50.8), 
                    `162145` = c(29.2, 29.7, 29.4, 31.1, 29.6, 43.3), 
                    `162146` = c(32.6, 33, 34.1, 36.4, 41.4, 56.7), 
                    `162147` = c(58.7, 68.2, 65, 56.4, 88.2, 106.6), 
                    `163146` = c(50.6, 53.2, 48, 89.7, 86, 72.5), 
                    `163147` = c(62.1, 82.3, 78.5, 83.9, 91.2, 94.2)), 
               class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6"))

percentage = structure(list(xy = c("160143", "161143", "107167", "107168", "107169", "108167"), 
                            perc = c(50.22337, 29.69779, 41.98815, 66.68095, 37.67827, 29.69238)), 
                       row.names = c("1", "2", "3", "4", "5", "6"), class = "data.frame")

使用tidyverse解决方案,该解决方案需要进行一些重塑,然后加入相应的值:

library(tidyverse)

RH %>%
  mutate(id = row_number()) %>%
  gather(xy, value, -id) %>%
  inner_join(percentage, by="xy") %>%
  mutate(value = value * perc) %>%
  select(-perc) %>%
  spread(xy, value) %>%
  select(-id)

#      160143    161143
# 1 1250.5619  784.0217
# 2  532.3677  873.1150
# 3  888.9536  911.7222
# 4 1968.7561 1146.3347
# 5 1160.1598  683.0492
# 6 1391.1873  834.5079

注意,最终结果将是一个表,该表具有与初始RH数据集相同的行数和列数。这里的列较少,因为只有这2列与您发布的percentage数据集匹配。

答案 2 :(得分:1)

如果OP也想要原始表,我们只需修改用户AntoniosK的答案即可:

RH %>% 
  mutate(id = row_number()) %>% 
  gather(key = column_name, value, -id) %>% 
  left_join(percentage, by = c("column_name" = "xy")) %>% 
  mutate(perc = ifelse(is.na(perc), 1, perc),
         new_value = value*perc) %>%
  select(-value, -perc) %>% 
  spread(column_name, new_value) %>% 
  select(-id)

#      160143    161143 161144 161145 161146 162145 162146 162147 163146 163147
#1 1250.5619  784.0217   27.4   28.5   30.4   29.2   32.6   58.7   50.6   62.1
#2  532.3677  873.1150   29.3   29.5   30.3   29.7   33.0   68.2   53.2   82.3
#3  888.9536  911.7222   30.7   31.7   31.5   29.4   34.1   65.0   48.0   78.5
#4 1968.7561 1146.3347   41.0   37.5   29.0   31.1   36.4   56.4   89.7   83.9
#5 1160.1598  683.0492   27.9   29.9   38.2   29.6   41.4   88.2   86.0   91.2
#6 1391.1873  834.5079   38.5   40.7   50.8   43.3   56.7  106.6   72.5   94.2

(对不起,我是新用户,无法评论AntoniosK的回答)