我有一个数据框,想要在特定位置插入空格。以下是数据示例:
0MHOCAN000006026421HOCAN000000392457HOCAN000005311227
0FHOUSA000002272874HOUSA000002272874HOUSA000050206641
0MHOUSA000002272874HOUSA000002076121HOUSA000014569699
这就是我想要的(在任何字母H之前的空格):
0M HOCAN000006026421 HOCAN000000392457 HOCAN000005311227
0F HOUSA000002272874 HOUSA000002272874 HOUSA000050206641
0M HOUSA000002272874 HOUSA000002076121 HOUSA000014569699
答案 0 :(得分:5)
您可以使用带有固定字符串替换的gsub
:
x <- c("0MHOCAN000006026421HOCAN000000392457HOCAN000005311227",
"0FHOUSA000002272874HOUSA000002272874HOUSA000050206641",
"0MHOUSA000002272874HOUSA000002076121HOUSA000014569699")
gsub("H", " H", x, fixed=TRUE)
请参阅R demo
输出:
[1] "0M HOCAN000006026421 HOCAN000000392457 HOCAN000005311227"
[2] "0F HOUSA000002272874 HOUSA000002272874 HOUSA000050206641"
[3] "0M HOUSA000002272874 HOUSA000002076121 HOUSA000014569699"
如果您的数据框df
列名称为col1
,则可以使用
df$col1 = gsub("H", " H", df$col1, fixed=TRUE)
答案 1 :(得分:2)
我们可以使用固定宽度读取:
基本功能read.fwf
:
x1 <- read.fwf("temp.txt",
widths = c(2, 17, 17, 17),
col.names = paste0("myColName",1:4),
stringsAsFactors = FALSE)
# check output
str(x1)
# 'data.frame': 3 obs. of 4 variables:
# $ myColName1: chr "0M" "0F" "0M"
# $ myColName2: chr "HOCAN000006026421" "HOUSA000002272874" "HOUSA000002272874"
# $ myColName3: chr "HOCAN000000392457" "HOUSA000002272874" "HOUSA000002076121"
# $ myColName4: chr "HOCAN000005311227" "HOUSA000050206641" "HOUSA000014569699"
x1
# myColName1 myColName2 myColName3 myColName4
# 1 0M HOCAN000006026421 HOCAN000000392457 HOCAN000005311227
# 2 0F HOUSA000002272874 HOUSA000002272874 HOUSA000050206641
# 3 0M HOUSA000002272874 HOUSA000002076121 HOUSA000014569699
使用read_fwf
包中的readr
:
library(readr)
x2 <- read_fwf("temp.txt",
fwf_widths(c(2, 17, 17, 17),
col_names = paste0("myColName",1:4)))
# check output
str(x2)
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of 4 variables:
# $ myColName1: chr "0M" "0F" "0M"
# $ myColName2: chr "HOCAN000006026421" "HOUSA000002272874" "HOUSA000002272874"
# $ myColName3: chr "HOCAN000000392457" "HOUSA000002272874" "HOUSA000002076121"
# $ myColName4: chr "HOCAN000005311227" "HOUSA000050206641" "HOUSA000014569699"
# - attr(*, "spec")=List of 2
# ..$ cols :List of 4
# .. ..$ myColName1: list()
# .. .. ..- attr(*, "class")= chr "collector_character" "collector"
# .. ..$ myColName2: list()
# .. .. ..- attr(*, "class")= chr "collector_character" "collector"
# .. ..$ myColName3: list()
# .. .. ..- attr(*, "class")= chr "collector_character" "collector"
# .. ..$ myColName4: list()
# .. .. ..- attr(*, "class")= chr "collector_character" "collector"
# ..$ default: list()
# .. ..- attr(*, "class")= chr "collector_guess" "collector"
# ..- attr(*, "class")= chr "col_spec"
x2
# # A tibble: 3 × 4
# myColName1 myColName2 myColName3 myColName4
# <chr> <chr> <chr> <chr>
# 1 0M HOCAN000006026421 HOCAN000000392457 HOCAN000005311227
# 2 0F HOUSA000002272874 HOUSA000002272874 HOUSA000050206641
# 3 0M HOUSA000002272874 HOUSA000002076121 HOUSA000014569699
即使ID不以字母H
开头且ID可包含多个H
,这些解决方案仍然有效。