我正在使用Census数据,我需要将四个字符列合并为一个列。
示例:
LOGRECNO STATE COUNTY TRACT BLOCK
60 01 001 021100 1053
61 01 001 021100 1054
62 01 001 021100 1055
63 01 001 021100 1056
64 01 001 021100 1057
65 01 001 021100 1058
我想创建一个新列,将STATE,COUNTY,TRACT和BLOCK的字符串一起添加到单个字符串中。例如:
LOGRECNO STATE COUNTY TRACT BLOCK BLOCKID
60 01 001 021100 1053 01001021101053
61 01 001 021100 1054 01001021101054
62 01 001 021100 1055 01001021101055
63 01 001 021100 1056 01001021101056
64 01 001 021100 1057 01001021101057
65 01 001 021100 1058 01001021101058
我试过了:
AL_Blocks$BLOCK_ID<- paste(c(AL_Blocks$STATE, AL_Blocks$County, AL_Blocks$TRACT, AL_Blocks$BLOCK), collapse = "")
但是这会将所有四列的所有行合并为一个字符串。
答案 0 :(得分:13)
您可以使用do.call
和paste0
。尝试:
AL_Blocks$BLOCK_ID <- do.call(paste0, AL_Block[c("STATE", "COUNTY", "TRACT", "BLOCK")])
示例输出:
do.call(paste0, AL_Blocks[c("STATE", "COUNTY", "TRACT", "BLOCK")])
# [1] "010010211001053" "010010211001054" "010010211001055" "010010211001056"
# [5] "010010211001057" "010010211001058"
do.call(paste0, AL_Blocks[2:5])
# [1] "010010211001053" "010010211001054" "010010211001055" "010010211001056"
# [5] "010010211001057" "010010211001058"
你也可以使用“tidyr”中的unite
,如下所示:
library(tidyr)
library(dplyr)
AL_Blocks %>%
unite(BLOCK_ID, STATE, COUNTY, TRACT, BLOCK, sep = "", remove = FALSE)
# LOGRECNO BLOCK_ID STATE COUNTY TRACT BLOCK
# 1 60 010010211001053 01 001 021100 1053
# 2 61 010010211001054 01 001 021100 1054
# 3 62 010010211001055 01 001 021100 1055
# 4 63 010010211001056 01 001 021100 1056
# 5 64 010010211001057 01 001 021100 1057
# 6 65 010010211001058 01 001 021100 1058
其中“AL_Blocks”提供为:
AL_Blocks <- structure(list(LOGRECNO = c("60", "61", "62", "63", "64", "65"),
STATE = c("01", "01", "01", "01", "01", "01"), COUNTY = c("001", "001",
"001", "001", "001", "001"), TRACT = c("021100", "021100", "021100",
"021100", "021100", "021100"), BLOCK = c("1053", "1054", "1055", "1056",
"1057", "1058")), .Names = c("LOGRECNO", "STATE", "COUNTY", "TRACT",
"BLOCK"), class = "data.frame", row.names = c(NA, -6L))
答案 1 :(得分:9)
试试这个:
AL_Blocks$BLOCK_ID<- with(AL_Blocks, paste0(STATE, COUNTY, TRACT, BLOCK))
县里有一个错字......它本应该是县。此外,您不需要折叠参数。
我希望有所帮助。
答案 2 :(得分:3)
或试试这个
DF$BLOCKID <-
paste(DF$LOGRECNO, DF$STATE, DF$COUNTY,
DF$TRACT, DF$BLOCK, sep = "")
(这是为稍后进入此讨论的人设置数据框的方法)
DF <-
data.frame(LOGRECNO = c(60, 61, 62, 63, 64, 65),
STATE = c(1, 1, 1, 1, 1, 1),
COUNTY = c(1, 1, 1, 1, 1, 1),
TRACT = c(21100, 21100, 21100, 21100, 21100, 21100),
BLOCK = c(1053, 1054, 1055, 1056, 1057, 1058))
答案 3 :(得分:3)
你也可以尝试这个
AL_Blocks <- transform(All_Blocks, BLOCKID = paste(STATE,COUNTY,
TRACT, BLOCK, sep = "")
答案 4 :(得分:2)
您可以使用tidyverse
包:
DF %>% unite(new_var, STATE, COUNTY, TRACT, BLOCK)
答案 5 :(得分:0)
您可以同时 WRITE 和 READ 带有任何指定的“字符串分隔符”(不一定是字符分隔符)的文本文件。这在数据几乎具有所有终端符号的情况下非常有用,因此,不能将1个符号用作分隔符。以下是 read 和 write 函数的示例:
writeSepText <- function(df, fileName, separator) {
con <- file(fileName)
data <- apply(df, 1, paste, collapse = separator)
# data
data <- writeLines(data, con)
close(con)
return
}
writeSepText(df=as.data.frame(Titanic), fileName="/Users/user/break_sep.txt", separator="<break>")
readSepText <- function(fileName, separator) {
data <- readLines(con <- file(fileName))
close(con)
records <- sapply(data, strsplit, split=separator)
dataFrame <- data.frame(t(sapply(records,c)))
rownames(dataFrame) <- 1: nrow(dataFrame)
return(as.data.frame(dataFrame,stringsAsFactors = FALSE))
}
df <- readSepText(fileName="/Users/user/break_sep.txt", separator="<break>"); df