如何在整个data.frame

时间:2018-02-15 04:56:07

标签: r regex dplyr tidyr tidyverse

我有下表,其中我有汽车备件的项目名称。我有汽车制造商生产的特定零件的ITEM代码,我也有零件制造商生产的相同零件的相应ITEM代码。

我定期获得一个输入,我只获得已售出的ITEM代码。如何识别出售的部件。

> trial
# A tibble: 6 x 5
  Name         `OEM Part` `OES 1 Code`   `OES 2 Code` `OES 3 Code`
  <chr>        <chr>      <chr>          <chr>        <chr>       
1 Brakes       231049A76  1910290/230023 NA           NA          
2 Cables       2410ASD12  NA             219930       3213Q23     
3 Tyres        9412HJ12   231233         NA           NA          
4 Suspension   756634K71  782320/880716  NA           NA          
5 Ball Bearing 2IW2WD23   231224         NA           NA          
6 Clutches     9304JFW3   NA             QQW223       23RQR3      

假设我输入了以下值

> item_code <- c("231049A76", "1910290", "1910290", "23RQR3")

我需要以下输出

Name
Brakes
Brakes
Brakes
Clutches

注意: 1910290230023是独立的部分;它们都是稍微改动的刹车。

3 个答案:

答案 0 :(得分:6)

如果您将数据重新整形为长格式,则可以使用连接:

library(tidyverse)

trial <- tibble(Name = c("Brakes", "Cables", "Tyres", "Suspension", "Ball Bearing", "Clutches"), 
                `OEM Part` = c("231049A76", "2410ASD12", "9412HJ12", "756634K71", "2IW2WD23", "9304JFW3"), 
                `OES 1 Code` = c("1910290/230023", NA, "231233", "782320/880716", "231224", NA), 
                `OES 2 Code` = c(NA, "219930", NA, NA, NA, "QQW223"), 
                `OES 3 Code` = c(NA, "3213Q23", NA, NA, NA, "23RQR3"))

trial_long <- trial %>% 
    gather('code_type', 'code', -Name) %>%    # reshape to long form
    separate_rows(code) %>%    # separate double values
    drop_na(code)    # drop unnecessary NA rows

# join to filter and duplicate
trial_long %>% 
    right_join(tibble(code = c("231049A76", "1910290", "1910290", "23RQR3")))
#> # A tibble: 4 x 3
#>   Name     code_type  code     
#>   <chr>    <chr>      <chr>    
#> 1 Brakes   OEM Part   231049A76
#> 2 Brakes   OES 1 Code 1910290  
#> 3 Brakes   OES 1 Code 1910290  
#> 4 Clutches OES 3 Code 23RQR3

答案 1 :(得分:3)

使用sapplyapply的效率不高的方法,我们会在trial中找到item_code中的哪一行,然后获取相应的Name

sapply(item_code, function(x)   
            trial$Name[apply(trial[-1], 1,  function(y)  any(grepl(x, y)))])

# 231049A76    1910290    1910290     23RQR3 
#  "Brakes"   "Brakes"   "Brakes" "Clutches" 

如果您不需要名称,请在USE.NAMES = FALSE中设置sapply

答案 2 :(得分:1)

以下是使用base的类似于您的示例:

## Create a dummy matrix
example <- cbind(matrix(1:4, 4,1), matrix(letters[1:20], 4, 4))
colnames(example) <- c("names", "W", "X", "Y", "Z")
#     names W   X   Y   Z  
#[1,] "1"   "a" "e" "i" "m"
#[2,] "2"   "b" "f" "j" "n"
#[3,] "3"   "c" "g" "k" "o"
#[4,] "4"   "d" "h" "l" "p"

此表与您的表类似,其中名称位于第一列,而模式则匹配其他列。

## The pattern of interest
pattern <- c("a","e", "f", "p")

对于此模式,我们希望得到以下结果:"1","1","2","4"

## Detecting the pattern per row
matching_rows <- row(example[,-1])[example[,-1] %in% pattern]
#[1] 1 1 2 4

## Returning the rows with the pattern
example[matching_rows,1]
#[1] "1" "1" "2" "4"