如何将多个字符列组合成R数据帧中的单个列

时间:2014-01-08 18:08:40

标签: string r

我正在使用Census数据,我需要将四个字符列合并为一个列。

示例:

LOGRECNO STATE COUNTY  TRACT BLOCK
    60    01    001  021100  1053
    61    01    001  021100  1054
    62    01    001  021100  1055
    63    01    001  021100  1056
    64    01    001  021100  1057
    65    01    001  021100  1058

我想创建一个新列,将STATE,COUNTY,TRACT和BLOCK的字符串一起添加到单个字符串中。例如:

LOGRECNO STATE COUNTY  TRACT BLOCK  BLOCKID
    60    01    001  021100  1053   01001021101053
    61    01    001  021100  1054   01001021101054
    62    01    001  021100  1055   01001021101055
    63    01    001  021100  1056   01001021101056
    64    01    001  021100  1057   01001021101057
    65    01    001  021100  1058   01001021101058

我试过了:

AL_Blocks$BLOCK_ID<- paste(c(AL_Blocks$STATE, AL_Blocks$County, AL_Blocks$TRACT,    AL_Blocks$BLOCK), collapse = "")

但是这会将所有四列的所有行合并为一个字符串。

6 个答案:

答案 0 :(得分:13)

您可以使用do.callpaste0。尝试:

AL_Blocks$BLOCK_ID <- do.call(paste0, AL_Block[c("STATE", "COUNTY", "TRACT", "BLOCK")])

示例输出:

do.call(paste0, AL_Blocks[c("STATE", "COUNTY", "TRACT", "BLOCK")])
# [1] "010010211001053" "010010211001054" "010010211001055" "010010211001056"
# [5] "010010211001057" "010010211001058"
do.call(paste0, AL_Blocks[2:5])
# [1] "010010211001053" "010010211001054" "010010211001055" "010010211001056"
# [5] "010010211001057" "010010211001058"

你也可以使用“tidyr”中的unite,如下所示:

library(tidyr)
library(dplyr)
AL_Blocks %>% 
  unite(BLOCK_ID, STATE, COUNTY, TRACT, BLOCK, sep = "", remove = FALSE)
#   LOGRECNO        BLOCK_ID STATE COUNTY  TRACT BLOCK
# 1       60 010010211001053    01    001 021100  1053
# 2       61 010010211001054    01    001 021100  1054
# 3       62 010010211001055    01    001 021100  1055
# 4       63 010010211001056    01    001 021100  1056
# 5       64 010010211001057    01    001 021100  1057
# 6       65 010010211001058    01    001 021100  1058

其中“AL_Blocks”提供为:

AL_Blocks <- structure(list(LOGRECNO = c("60", "61", "62", "63", "64", "65"), 
    STATE = c("01", "01", "01", "01", "01", "01"), COUNTY = c("001", "001", 
    "001", "001", "001", "001"), TRACT = c("021100", "021100", "021100", 
    "021100", "021100", "021100"), BLOCK = c("1053", "1054", "1055", "1056",
    "1057", "1058")), .Names = c("LOGRECNO", "STATE", "COUNTY", "TRACT", 
    "BLOCK"), class = "data.frame", row.names = c(NA, -6L))

答案 1 :(得分:9)

试试这个:

AL_Blocks$BLOCK_ID<- with(AL_Blocks, paste0(STATE, COUNTY, TRACT, BLOCK))

县里有一个错字......它本应该是县。此外,您不需要折叠参数。

我希望有所帮助。

答案 2 :(得分:3)

或试试这个

DF$BLOCKID <-
  paste(DF$LOGRECNO, DF$STATE, DF$COUNTY, 
        DF$TRACT, DF$BLOCK, sep = "")

(这是为稍后进入此讨论的人设置数据框的方法)

DF <- 
  data.frame(LOGRECNO = c(60, 61, 62, 63, 64, 65),
             STATE = c(1, 1, 1, 1, 1, 1),
             COUNTY = c(1, 1, 1, 1, 1, 1), 
             TRACT = c(21100, 21100, 21100, 21100, 21100, 21100), 
             BLOCK = c(1053, 1054, 1055, 1056, 1057, 1058))

答案 3 :(得分:3)

你也可以尝试这个

AL_Blocks <- transform(All_Blocks, BLOCKID = paste(STATE,COUNTY,
                       TRACT, BLOCK, sep = "")

答案 4 :(得分:2)

您可以使用tidyverse包:

DF %>% unite(new_var, STATE, COUNTY, TRACT, BLOCK)

答案 5 :(得分:0)

您可以同时 WRITE READ 带有任何指定的“字符串分隔符”(不一定是字符分隔符)的文本文件。这在数据几乎具有所有终端符号的情况下非常有用,因此,不能将1个符号用作分隔符。以下是 read write 函数的示例:

写出特殊分隔符文本:

writeSepText <- function(df, fileName, separator) {
    con <- file(fileName)
    data <- apply(df, 1, paste, collapse = separator)
    # data
    data <- writeLines(data, con)
    close(con)
    return
}

测试写出用字符串“ bra_break_ket”分隔的文本文件

writeSepText(df=as.data.frame(Titanic), fileName="/Users/user/break_sep.txt", separator="<break>")

读取带有特殊分隔符字符串的文本文件

readSepText <- function(fileName, separator) {
    data <- readLines(con <- file(fileName))
    close(con)
    records <- sapply(data, strsplit, split=separator)
    dataFrame <- data.frame(t(sapply(records,c)))
    rownames(dataFrame) <- 1: nrow(dataFrame)
    return(as.data.frame(dataFrame,stringsAsFactors = FALSE))
}

分隔的文本文件中的测试读取
df <- readSepText(fileName="/Users/user/break_sep.txt", separator="<break>"); df