如何使用dplyr在两个嵌套列中应用操作

时间:2018-03-14 08:03:54

标签: r dplyr tidyverse

我有以下数据

dat <- structure(list(motif = "JUND", celltype_specific_genes = list(
    structure(list(genes = c("BDNF", "IFI202B", "JUN"), tissue = c("P-XXX", 
    "P-XXX", "P-XXX")), .Names = c("genes", "tissue"), row.names = c(NA, 
    -3L), class = c("tbl_df", "tbl", "data.frame"))), ipa_motif_genes = list(
    structure(list(genes = c("BCL3", "BDNF", "CCND1", "CDKN2A", 
    "CYBB", "DUSP1", "HMOX1", "IFNG", "IFI202B", "JUN", "JUNB", 
    "MMP9", "NOX4", "SAT1", "SOCS1", "TBX21", "VEGFA")), .Names = "genes", row.names = c(NA, 
    -17L), class = c("tbl_df", "tbl", "data.frame")))), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -1L), .Names = c("motif", 
"celltype_specific_genes", "ipa_motif_genes"))

library(dplyr)
dat 
#> # A tibble: 1 x 3
#>   motif celltype_specific_genes ipa_motif_genes  
#>   <chr> <list>                  <list>           
#> 1 JUND  <tibble [3 × 2]>        <tibble [17 × 1]>

实际上我有更多行。

嵌套列包含以下向量

celltype_specific_genes <- c("BDNF", "IFI202B", "JUN")
ipa_motif_genes <- c("JUN", "BDNF", "CCND1", "CDKN2A", 
        "CYBB", "DUSP1", "HMOX1", "IFNG", "IFI202B", "JUN", "JUNB", 
        "MMP9", "NOX4", "SAT1", "SOCS1", "TBX21", "VEGFA")
setdiff(ipa_motif_genes, celltype_specific_genes)
 #[1] "BCL3"   "CCND1"  "CDKN2A" "CYBB"   "DUSP1"  "HMOX1"  "IFNG"   "JUNB"   "MMP9"   "NOX4"   "SAT1"   "SOCS1"  "TBX21"  "VEGFA" 

使用dplyr管道我想要做的是添加新列,其中包含嵌套的celltype_specific_genesipa_motif_genes之间的差异。

我怎样才能做到这一点?

更新

我有另一个不在dat的载体。

full_genes <- c("JUN", "TRAPPC3", "SLC12A6", "IGBP1", "M6PR", "GM829", "APC", "HSD17B12", "CD59B", "OSTM1", "SLC10A6", "AKAP8", "CRP", "GHITM", "1110065P20RIK", "GM29685", "DSCAML1", "SNX15", "ZFP385C", "DNAJC25", "CORIN", "NUDT22", "MAP1A", "CHMP2A", "SDR16C5", "ADRA1D", "UPP2", "GM13242", "PLXNB2", "ABI1", "CACNB3", "MILL2", "DAPK3", "SPTA1", "ADNP", "H2AFX", "SLC22A14", "CIC", "PHACTR3", "2010107G12RIK", "KLC3", "SUSD4", "SLC25A15", "PTPRT", "RTEL1", "KCNU1", "SMIM13", "OLFR207", "SAMD4B", "SPIC")

如何添加另一列,使full_genescelltype_specific_genes之间产生差异?

我试过了,但不会做

Diff2 = map2(celltype_specific_genes, ~ tibble(setdiff(full_genes, .x$genes)))

1 个答案:

答案 0 :(得分:2)

我们可以使用[Fact] public async Task SignalRHubTest_Foo() { var webHostBuilder = WebHost.CreateDefaultBuilder().UseStartup<Startup>(); using (var testServer = new TestServer(webHostBuilder)) { var hubConnection = await StartConnectionAsync(testServer.CreateHandler()); } } private static async Task<HubConnection> StartConnectionAsync(HttpMessageHandler handler) { var hubConnection = new HubConnectionBuilder() .WithUrl($"http://test/fooHub", options => { options.Transports = HttpTransportType.LongPolling; options.HttpMessageHandlerFactory = _ => handler; }) .Build(); await hubConnection.StartAsync(); return hubConnection; } 遍历map2列,并获取'cell_type_specific_genes'中不存在'motif_genes'中的元素

list

对于外部向量与数据集中的列

之间进行比较的第二种情况
dat %>%
   mutate(Diff = map2(celltype_specific_genes, ipa_motif_genes, 
                ~ tibble(setdiff(.y$genes, .x$genes)))) 
# A tibble: 1 x 4
#   motif celltype_specific_genes ipa_motif_genes   Diff             
#  <chr> <list>                  <list>            <list>           
#1 JUND  <tibble [3 x 2]>        <tibble [17 x 1]> <tibble [14 x 1]>