(R编程)根据列X中的重复值在列Y中创建数据对

时间:2017-06-26 16:31:52

标签: r function for-loop duplicates

我正在研究R
假设所有数据都是字符串
还要考虑我的实际数据集是巨大的

列X有重复的
当X中的数据出现不止一次时,创建所有可能的不同Y对(当x是常数时),同时也保留Z列。

 X  Y    Z  
 1  a    RED   
 1  b    BLUE  
 1  c    PINK   
 1  d    YELLOW  
 2  a    PURPLE   
 3  a    ORANGE   
 3  b    GREEN  
 4  a    BLACK  
 4  b    WHITE   
 4  c    BROWN

所以我想要实现的结果是

 X   Y1  Y2  Z1      Z2  
 1   a   b   RED     BLUE  
 1   a   c   RED     PINK  
 1   a   d   RED     YELLOW  
 1   b   c   BLUE    PINK  
 1   b   d   BLUE    YELLOW  
 1   c   d   PINK    YELLOW  
 2   a   NA  PURPLE  NA  
 3   a   b   ORANGE  GREEN  
 4   a   b   BLACK   WHITE  
 4   a   c   BLACK   BROWN  
 4   b   c   WHITE   BROWN  

我认为在R中如何编码“查找并创建所有不同的列对”的问题由< Expand data frame into combinations of row pairs>解决。
所以我的问题是如何在R中编码 “对于每一个X,每次有重复,将它们组合在一起,以找到并创建所有可能的Y和Z对(对于每个特定的X)”

我希望我的问题清楚明白!

请帮忙! :)

我实际数据集的一小部分(将有更多列)(X =分类; Y =主机种类; Z =其他所有内容):

parspecies          |pargenus      |hostspecies               |hostgenus
----------------------------------------------------------------------   
Blattophagus beci   |Blatophagus     |Platyzostreia castanea        |Platyzostreia

Blissoxenos esakii  |Blissoxenos     |Dimorphopterus japonicus   |Dimorphopterus

Blissoxenos esakii  |Blissoxenos     |Iphicrates spinicaput         |Iphicrates

Blissoxenos esakii  |Blissoxenos     |Macropes obnubilus            Macropes

Caenocholax fenesi  |Caenocholax     |Camponotus atriaps        |Camponotus

Caenocholax fenesi  |Caenocholax     |Camponotus planatus       |Camponotus

2 个答案:

答案 0 :(得分:0)

这是在基础R

中执行此操作的一种方法
#set up a list of matrices with the df$Y pairs (include NAs up to length 2)
combs <- tapply(df$Y,df$X,function(x) {length(x) <- max(2,length(x));return(t(combn(x,2)))})
#convert to a data.frame
df2 <- as.data.frame(do.call(rbind,combs),stringsAsFactors = FALSE)
names(df2) <- c("Y1","Y2")
#recreate values of df$X that are lost by previous steps
df2$X <- rep(as.numeric(names(combs)),times=sapply(combs,nrow))
#merge in the colours in df$Z
df2 <- merge(df2,df,by.x=c("X","Y2"),by.y=c("X","Y"),all.x=TRUE)
df2 <- merge(df2,df,by.x=c("X","Y1"),by.y=c("X","Y"),all.x=TRUE,suffixes=c("1","2"))
#get correct column order after merge
df2[,4:5] <- df2[,5:4]

df2
   X Y1   Y2     Z1     Z2
1  1  a    b    RED   BLUE
2  1  a    c    RED   PINK
3  1  a    d    RED YELLOW
4  1  b    c   BLUE   PINK
5  1  b    d   BLUE YELLOW
6  1  c    d   PINK YELLOW
7  2  a <NA> PURPLE   <NA>
8  3  a    b ORANGE  GREEN
9  4  a    b  BLACK  WHITE
10 4  a    c  BLACK  BROWN
11 4  b    c  WHITE  BROWN

数据:

df <- data.frame(X = c(1L, 1L, 1L, 1L, 2L, 3L, 3L, 4L, 4L, 4L), 
                 Y = c("a", "b", "c", "d", "a", "a", "b", "a", "b", "c"), 
                 Z = c("RED", "BLUE", "PINK", "YELLOW", "PURPLE", "ORANGE", "GREEN", "BLACK", "WHITE", "BROWN"),
                 stringsAsFactors = FALSE)

答案 1 :(得分:0)

考虑每个X组的by merge本身dfList = by(df, df$X, function(i){ tmp <- merge(i, i, by="X", suffix=c("1", "2")) if (nrow(tmp) > 1) { tmp <- subset(tmp, Y1 < Y2)[c("X","Y1","Y2","Z1","Z2")] } else { tmp[c("Y2","Z2")] <- NA } return(tmp) }) newdf <- do.call(rbind, dfList) rownames(newdf) <- NULL newdf # X Y1 Y2 Z1 Z2 # 1 1 a b RED BLUE # 2 1 a c RED PINK # 3 1 a d RED YELLOW # 4 1 b c BLUE PINK # 5 1 b d BLUE YELLOW # 6 1 c d PINK YELLOW # 7 2 a <NA> PURPLE <NA> # 8 3 a b ORANGE GREEN # 9 4 a b BLACK WHITE # 10 4 a c BLACK BROWN # 11 4 b c WHITE BROWN

dfList = by(df, df$X, function(i){
  subset(merge(i, i, by="X", suffix=c("1", "2")), Y1 < Y2)[c("X","Y1","Y2","Z1","Z2")]
})

如果不是特殊的 PURPLE (1行组df),那么单线会做:

<nav class="navbar navbar-default navbar-fixed-top">  
  <div class="container-fluid">
    <div class="navbar-header">
      <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#myNavbar">
        <span class="icon-bar"></span>
        <span class="icon-bar"></span>
        <span class="icon-bar"></span>
      </button>
      <a class="navbar-brand pageTitle">iMobile</a>
    </div>
      <div class="collapse navbar-collapse" id="myNavbar">
          <router-outlet name="navbar"></router-outlet>  

          <ul class="nav navbar-nav navbar-right" *ngIf="authenticated">
              <li><a [routerLink]="['/login']"><span class="glyphicon glyphicon-log-out"></span> Log Out</a></li>
          </ul>
      </div>
  </div>
</nav>

<alert></alert>
<router-outlet></router-outlet>