在R中,当复制到的df有2个额外的列时,如何将行从一个数据帧复制到另一个数据框?

时间:2017-04-13 10:19:05

标签: r dataframe rbind

我有一个带有12个列的制表符分隔文本文件,我将其上传到我的程序中。我继续创建另一个数据框,其结构类似于上传的数据框,并为其添加2个列。

excelfile = read.delim(ExcelPath)
matchedPictures<- excelfile[0,]
matchedPictures$beforeName <- character()
matchedPictures$afterName <- character()

现在我有一个功能,我可以在其中执行以下操作:

  1. 根据条件,我获取了需要从pictureMatchNum复制到excelfile的行的行号matchedPictures
  2. 然后我应该将行从excelfile复制到matchedPictures。到目前为止,我尝试了几种不同的方式。

    一个。

    rowNumber = nrow(matchedPictures) + 1
    matchedPictures[rowNumber,1:12] <<- excelfile[pictureMatchNum,1:12]
    

    matchedPictures[rowNumber,1:12] <<- rbind(matchedPictures, excelfile[pictureWordMatches,1:12], make.row.names = FALSE)
    
  3. 2a上。似乎没有用,因为它复制了excelfile中的索引并将它们用作matchedPictures中的行名称 - 这就是为什么我决定使用rbind

    2B。似乎不起作用,因为rbind需要列相同,matchedPictures有2个额外的列。

    编辑开始 - 包括可重复的示例。

    这是一些可重现的代码(具有较少的列和假数据)

    excelfile <- data.frame(x = letters, y = words[length(letters)], z= fruit[length(letters)] )
    matchedPictures <- excelfile[0,]
    matchedPictures$beforeName <- character()
    matchedPictures$afterName <- character()
    
    pictureMatchNum1 = match(1, str_detect("A", regex(excelfile$x, ignore_case = TRUE)))
    rowNumber1 = nrow(matchedPictures) + 1
    
    pictureMatchNum2 = match(1, str_detect("D", regex(excelfile$x, ignore_case = TRUE)))
    rowNumber2 = nrow(matchedPictures) + 1
    

    我尝试的两个选项是

    2a上。

    matchedPictures[rowNumber1,1:3] <<- excelfile[pictureMatchNum1,1:3]
    matchedPictures[rowNumber1,"beforeName"] <<- "xxx"
    matchedPictures[rowNumber1,"afterName"] <<- "yyy"
    
    matchedPictures[rowNumber2,1:3] <<- excelfile[pictureMatchNum2,1:3]
    matchedPictures[rowNumber2,"beforeName"] <<- "uuu"
    matchedPictures[rowNumber2,"afterName"] <<- "www"
    

    OR

    2B。

    matchedPictures[rowNumber1,1:3] <<- rbind(matchedPictures, excelfile[pictureMatchNum1,1:3], make.row.names = FALSE)
    matchedPictures[rowNumber1,"beforeName"] <<- "xxx"
    matchedPictures[rowNumber1,"afterName"] <<- "yyy"
    
    matchedPictures[rowNumber2,1:3] <<- rbind(matchedPictures, excelfile[pictureMatchNum2,1:3], make.row.names = FALSE)
    matchedPictures[rowNumber2,"beforeName"] <<- "uuu"
    matchedPictures[rowNumber2,"afterName"] <<- "www"
    

    编辑结束

    此外,我还看到许多地方的建议,不是使用空数据帧,而是应该有向量并将数据附加到向量,然后将它们组合成数据帧。当我有这么多列并且需要有14个单独的向量并分别复制它们时,这个建议是否有效?

    我可以做些什么来完成这项工作?

2 个答案:

答案 0 :(得分:0)

你可以

  • 首先确定符合条件的excelfile的行索引
  • 提取这些行
  • 然后生成数据以填充您的列beforeNameafterName
  • 然后将这些列附加到新数据框

示例:

excelfile <- data.frame(x = letters, y = words[length(letters)], 
    z = fruit[length(letters)])
    ## Vector of patterns:
patternVec <- c("A", "D", "M")
## Look for appropriate rows in file 'excelfile':
indexVec <- vapply(patternVec, 
        function(myPattern) which(str_detect(myPattern, 
                    regex(excelfile$x, ignore_case = TRUE))), integer(1))
## Extract these rows:
matchedPictures <- excelfile[indexVec,]
## Somehow generate the data for columns 'beforeName' and 'afterName':
## I do not know how this information is generated so I just insert 
## some dummy code here:
beforeNameVec <- c("xxx", "uuu", "mmm")
afterNameVec <- c("yyy", "www", "nnn")
## Then assign these variables:
matchedPictures$beforeName <- beforeNameVec
matchedPictures$afterName <- afterNameVec

matchedPictures
# x   y           z beforeName afterName
# a air dragonfruit        xxx       yyy
# d air dragonfruit        uuu       www
# m air dragonfruit        mmm       nnn

答案 1 :(得分:0)

使用dplyr

可以使这更简单
library(dplyr)
library(stringr)

excelfile <- data.frame(x = letters, y = words[length(letters)], z= fruit[length(letters)],
stringsAsFactors = FALSE ) #add stringsAsFactors to have character columns

pictureMatch <- excelfile %>%
  #create a match column
  mutate(match = ifelse(str_detect(x,"a") | str_detect(x,'d'),1,0)) %>% 
  #filter to only the columns that match your condition
  filter(match ==1)

pictureMatch <- pictureMatch[['x']] #convert to a vector

matchedPictures <- excelfile %>%
  filter(x %in% pictureMatch) %>% #grab the rows that match your condition
  mutate(beforeName = c('xxx','uuu'), #add your names
     afterName = c('yyy','www'))