使用R中的字符串子集将向量转换为2列数据帧

时间:2017-09-18 19:44:03

标签: r data-manipulation

我认为我不会发现这个问题的类似版本,因为我觉得这是一个相对独特的问题,但如果我弄错了,请指出正确的方向。我正在处理以下需要转换为数据帧的向量:

myvec = structure(c(1.03, 2.3, -1.2, -0.09, -0.31, -0.51, 3.4, 3, 0.07, 
0.02, 1.05, -0.02, 2.03), .Names = c("Intercept", "DEF-1017", 
"DEF-1025", "DEF-103", "DEF-1043", "DEF-1046", "DEF-1048", "DEF-1076", 
"OFF-1017", "OFF-1025", "OFF-103", "OFF-1046", "OFF-1076"))

head(myvec)
Intercept  DEF-1017  DEF-1025   DEF-103  DEF-1043  DEF-1046 
 1.03      2.30     -1.20     -0.09     -0.31     -0.51 

该矢量应该具有7个不同用户(用户1017,1025,103,1043,1046,1048,1076)的攻击性(OFF)和防御性(DEF)系数,但是对于两个用户而言存在令人反感的系数。我需要将其转换为具有4列的数据帧(防御ID,攻击ID,防御系数,攻击系数)。更具体地说,我想获得以下数据帧,以这种方式计算缺失值:

mydf = structure(list(DEFID = c("DEF-1017", "DEF-1025", "DEF-103", "DEF-1043", 
"DEF-1046", "DEF-1048", "DEF-1076"), OFFID = c("OFF-1017", "OFF-1025", 
"OFF-103", NA, "OFF-1046", NA, "OFF-1076"), DEFVAL = c(2.3, -1.2, 
-0.09, -0.31, -0.51, 3.4, 3), OFFVAL = c(0.07, 0.02, 1.05, NA, 
-0.02, NA, 2.03)), .Names = c("DEFID", "OFFID", "DEFVAL", "OFFVAL"
), row.names = c(NA, -7L), class = "data.frame")

mydf
     DEFID    OFFID DEFVAL OFFVAL
1 DEF-1017 OFF-1017   2.30   0.07
2 DEF-1025 OFF-1025  -1.20   0.02
3  DEF-103  OFF-103  -0.09   1.05
4 DEF-1043     <NA>  -0.31     NA
5 DEF-1046 OFF-1046  -0.51  -0.02
6 DEF-1048     <NA>   3.40     NA
7 DEF-1076 OFF-1076   3.00   2.03

拦截值被删除/不包含在表中,其他所有内容都按预期格式化。非常感谢任何帮助,谢谢!

2 个答案:

答案 0 :(得分:0)

我使用tidyr包来完成这样的任务:

首先转换为数据帧格式:

library(tidyverse)
df <- data_frame(names= names(myvec),
           values=myvec)

接下来过滤掉拦截,并使用tidyr命令重新排列:

df %>% filter(names !="Intercept") %>% 
  extract(names, into=c("coeff", "user"), "([[:alnum:]]+)-([[:alnum:]]+)") %>% 
  spread(coeff, values) 
# A tibble: 7 x 3
   user   DEF   OFF
* <chr> <dbl> <dbl>
1  1017  2.30  0.07
2  1025 -1.20  0.02
3   103 -0.09  1.05
4  1043 -0.31    NA
5  1046 -0.51 -0.02
6  1048  3.40    NA
7  1076  3.00  2.03

如果您希望名称等与上面列出的完全相同,则只需进一步处理:

df %>% filter(names !="Intercept") %>% 
  extract(names, into=c("coeff", "user"), "([[:alnum:]]+)-([[:alnum:]]+)") %>% 
  spread(coeff, values) %>% 
  mutate(DEFID = paste("DEF", user, sep="-"),
         OFFID = paste("OFF", user, sep="-")) %>%
  rename(DEFVAL=DEF,
         OFFVAL=OFF) %>% 
  select(DEFID, OFFID, DEFVAL, OFFVAL)
# A tibble: 7 x 4
     DEFID    OFFID DEFVAL OFFVAL
     <chr>    <chr>  <dbl>  <dbl>
1 DEF-1017 OFF-1017   2.30   0.07
2 DEF-1025 OFF-1025  -1.20   0.02
3  DEF-103  OFF-103  -0.09   1.05
4 DEF-1043 OFF-1043  -0.31     NA
5 DEF-1046 OFF-1046  -0.51  -0.02
6 DEF-1048 OFF-1048   3.40     NA
7 DEF-1076 OFF-1076   3.00   2.03

答案 1 :(得分:0)

这正是你想要的。我使用了+--------------------------------------------------+ | | TEXT | | IMAGE |--------------------------------------| | | TEXT | TEXT | +--------------------------------------------------+ splitsubstr。而且我认为这是一种最简单的方法,可以提供您想要的输出。

merge