如何将不同长度的字符串向量放入数据帧

时间:2019-06-30 15:51:34

标签: r dataframe

我有一个长矢量要放入R的数据框中。这是一个示例

vector<-c("1","John Doe","15%","2","Janet Doe","13%","3","Jack William Doe","10%") 

我想要一个看起来像这样的输出

    Position      Names       Percentage
1        1         John Doe        15%
2        2        Janet Doe        13%
3        3 Jack William Doe        10%

我知道解决方案将涉及data.frame(),可能还会涉及strsplit(),但稍后会拆分长度可变的名称。

4 个答案:

答案 0 :(得分:5)

一种选择是通过指定列数(matrix)转换为ncol,转换为data.frame,然后使用type.convert更改列类型

out <- as.data.frame(matrix(vector, ncol = 3, byrow = TRUE, 
      dimnames = list(NULL, c("Position", "Names", "Percentage"))), 
          stringsAsFactors = FALSE)
out[] <- lapply(out, type.convert, as.is = TRUE)
out
#  Position            Names Percentage
#1        1         John Doe        15%
#2        2        Janet Doe        13%
#3        3 Jack William Doe        10%

正如@nicola在评论中提到的那样,type.convert还引入了data.frame最新版本中的R的方法(已在R 3.6.0中签入)。因此,最后一行可以更改为

out <- type.convert(out, as.is = TRUE)

或者另一种选择是在将“向量”折叠为单个字符串后使用read.csv/read.table

read.csv(text= gsub("(([^,]+,){2}[^,]+),", "\\1\n",toString(vector)), 
    header = FALSE, stringsAsFactors = FALSE, col.names = c("Position", 
    "Names", "Percentage"), strip.white = TRUE)
#  Position            Names Percentage
#1        1         John Doe        15%
#2        2        Janet Doe        13%
#3        3 Jack William Doe        10%

这将确保类型根据值进行相应的转换,而不是以后进行转换

答案 1 :(得分:4)

tidyverse选项。首先根据重复1:3的向量分割向量,然后将parse_guess映射到分割上并输出data.frame,最后添加所需的名称

library(tidyverse)

vector %>% 
  split(rep_len(1:3, length(.))) %>% 
  map_df(parse_guess) %>% 
  setNames(c('Position', 'Name', 'Percentage'))

# # A tibble: 3 x 3
#   Position Name             Percentage
#      <int> <chr>            <chr>     
# 1        1 John Doe         15%       
# 2        2 Janet Doe        13%       
# 3        3 Jack William Doe 10%       

答案 2 :(得分:1)

也可以尝试:

data.frame(
  Position = vector[c(seq(1, length(vector), 3))],
  Names = vector[c(seq(2, length(vector), 3))],
  Percentage = vector[c(seq(3, length(vector), 3))]
)

或使用函数来避免冗长的重复:

foo <- function(x, n) x[c(seq(n, length(x), 3))]

data.frame(
  Position = foo(vector, 1),
  Names = foo(vector, 2),
  Percentage = foo(vector, 3)
)

答案 3 :(得分:0)

您可以使用grep来获取每种数据类型的索引,您可以使用它们来为每个数据帧变量的向量子集:

data.frame(Position = x[grep("\\d+$", x)],
           Names = x[grep("[^0-9%]", x)],
           Percentage = x[grep("%", x)]
)

#### OUTPUT ####

  Position            Names Percentage
1        1         John Doe        15%
2        2        Janet Doe        13%
3        3 Jack William Doe        10%