从长字符串创建新列分为300个子字符串?

时间:2017-01-17 00:34:22

标签: r string

我有一个包含1200个字符串的列。在每一个中,每四个字符组对于一个数字是十六进制的。即每行中300个以十六进制数字塞入1200字符串的数字。我需要将每个数字输入到十进制数,并进入自己的列(300个新列),命名为1-300。 这是我到目前为止所知道的:

  Data.frame:
                      BigString
                 [1]  0043003E803C0041004A...(etc...)

这是我到目前为止所做的事情:

decimal.fours <- function(x) {
    strtoi(substring(BigString[x], seq(1,1197,4), seq(4,1197,4)), 16L)
}
decimal.fours(1)
[1] 283   291   239   177 ...

但现在我被卡住了。如何将这些单独的数字(以及剩下的296)输出到新的列中?我总共有50行/字符串。一次完成它们会很棒,即300个新列,包含50个字符串的拆分子串。

3 个答案:

答案 0 :(得分:1)

您可以使用read.fwf读取每列固定宽度的文件:

# an example vector of big strings
BigString = c("0043003E803C0041004A", "0043003E803C0041004A", "0043003E803C0041004A")

n = 5                  # n is the number of columns for your result(300 for your real case)
as.data.frame(
      lapply(read.fwf(file = textConnection(BigString), 
                      widths = rep(4, n), 
                      colClasses = "character"), 
             strtoi, base = 16))

#  V1 V2    V3 V4 V5
#1 67 62 32828 65 74
#2 67 62 32828 65 74
#3 67 62 32828 65 74

如果您想保留decimal.hours函数,可以按如下方式修改它,并调用lapply将bigStrings转换为整数列表,这些整数可以通过{进一步转换为data.frame {1}}模式:

do.call(rbind, ...)

答案 1 :(得分:1)

尝试使用base-R

BigString = c("0043003E803C0041004A", "0043003E803C0041004A", "0043003E803C0041004A")
df = data.frame(BigString)


t(sapply(df$BigString, function(x) strtoi(substring(x, seq(1, 297, 4)[1:5],
                                                    seq(4, 300, 4)[1:5]), base = 16)))
#     [,1] [,2]  [,3] [,4] [,5]
#[1,]   67   62 32828   65   74
#[2,]   67   62 32828   65   74
#[3,]   67   62 32828   65   74

# you can set the columns together at the end using `paste0("new_col", 1:300)` 
# [1:5] was just used for this example, because i had strings of length 20cahr

答案 2 :(得分:1)

强制性的tidyverse示例:

library(tidyverse)

设置一些数据

set.seed(1492)

bet <- c(0:9, LETTERS[1:6]) # alphabet for hex digit sequences
i <- 8                      # number of rows
n <- 10                     # number of 4-hex-digit sequences

df <- data_frame(
   some_other_col=LETTERS[1:i],
   big_str=map_chr(1:i, ~sample(bet, 4*n, replace=TRUE) %>% paste0(collapse=""))
)

df
## # A tibble: 8 × 2
##   some_other_col                                  big_str
##            <chr>                                    <chr>
## 1              A 432100D86CAA388C15AEA6291E985F2FD3FB6104
## 2              B BC2673D112925EBBB3FD175837AF7176C39B4888
## 3              C B4E99FDAABA47515EADA786715E811EE0502ABE8
## 4              D 64E622D7037D35DE6ADC40D0380E1DC12D753CBC
## 5              E CF7CDD7BBC610443A8D8FCFD896CA9730673B181
## 6              F ED86AEE8A7B65F843200B823CFBD17E9F3CA4EEF
## 7              G 2B9BCB73941228C501F937DA8E6EF033B5DD31F6
## 8              H 40823BBBFDF9B14839B7A95B6E317EBA9B016ED5

进行操作

read_fwf(paste0(df$big_str, collapse="\n"),
         fwf_widths(rep(4, n)),
         col_types=paste0(rep("c", n), collapse="")) %>%
  mutate_all(strtoi, base=16) %>%
  bind_cols(df) %>%
  select(some_other_col, everything(), -big_str)
## # A tibble: 8 × 11
##   some_other_col    X1    X2    X3    X4    X5    X6    X7    X8    X9
##            <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1              A 17185   216 27818 14476  5550 42537  7832 24367 54267
## 2              B 48166 29649  4754 24251 46077  5976 14255 29046 50075
## 3              C 46313 40922 43940 29973 60122 30823  5608  4590  1282
## 4              D 25830  8919   893 13790 27356 16592 14350  7617 11637
## 5              E 53116 56699 48225  1091 43224 64765 35180 43379  1651
## 6              F 60806 44776 42934 24452 12800 47139 53181  6121 62410
## 7              G 11163 52083 37906 10437   505 14298 36462 61491 46557
## 8              H 16514 15291 65017 45384 14775 43355 28209 32442 39681
## # ... with 1 more variables: X10 <int>