非典型数据格式从长转换为宽

时间:2018-10-10 06:45:25

标签: r dplyr reshape2

我的数据:

    # A tibble: 6 x 4
  X__1            X__6                                                     X__7     X__8        
  <chr>           <chr>                                                    <chr>    <chr>       
1 Emp #:          xxyy                                                    Departm~ Corporate S~
2 Reason of Resi~ I think below are areas of improvement within my team C~ NA       NA          
3 Emp #:          xyyy                                                    Departm~ Corporate S~
4 Reason of Resi~ better oppurtunity                                       NA       NA          

我想将数据更改为以下格式

Emp #     Reason                                                 Department
10282     I think below are areas of improvement within my team  Corporate
10308     better oppurtunity                                     Corporate

复制数据

structure(list(X__1 = c("Emp #:", "Reason of Resignation:", "Emp #:", 
"Reason of Resignation:", "Emp #:", "Reason of Resignation:", 
"Emp #:", "Reason of Resignation:", "Emp #:", "Reason of Resignation:"
), X__6 = c("10282", "I think below are areas of improvement within my team CS / SME or my be cross the organization on my level (L1-L2). Lack of career growth specially in my department i.e. CS HOD/RSM/TLs/KAMs are on same position from last 5 years. Many people are here on same position from last 10-12 years. lack in focus on low level staff (L1 / L2) in terms of capacity building and career growth i.e. not a single training for my team on it. No rotation plans (for capacity building) for CS i.e. not a single team member rotated since I joined. Better opportunity in terms of career and financials outside ", 
"10308", "better oppurtunity", "11230", "Moving on another organization for career persuade", 
"13370", "Get a new job outside the company.", "14694", "Health Issues"
), X__7 = c("Department:", NA, "Department:", NA, "Department:", 
NA, "Department:", NA, "Department:", NA), X__8 = c("Corporate Solutions", 
NA, "Corporate Solutions", NA, "Region Central A", NA, "Region North", 
NA, "Finance Operations", NA)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

更多细节。

Emp#中的

X__1将进入第一列标题,该列的值将来自X__6,依此类推。

2 个答案:

答案 0 :(得分:1)

我添加了一个名为rid的新列,该行将成对的行分组,然后过滤掉所需的列,并通过left_join()将它们rid放回原处。

library(dplyr)

df <- mutate(df, rid = lapply(1:(nrow(df)/2), function(x) rep(x, 2)) %>% unlist())

left_join(
  df %>%
    filter(X__1 == "Emp #:") %>%
    select(rid, X__6) %>%
    rename("Emp #" = "X__6"),
  df %>%
    filter(X__1 == "Reason of Resignation:") %>%
    select(rid, X__6) %>%
    rename("Reason" = "X__6"),
  by = "rid") %>%
  left_join(df %>%
              filter(X__7 == "Department:") %>%
              select(rid, X__8) %>%
              rename("Department" = "X__8"),
            by = "rid") %>%
  select(-rid)

#  `Emp #` Reason                                                    Department     
#   <chr>   <chr>                                                     <chr>          
# 1 10282   I think below are areas of improvement within my team CS~ Corporate Solu~
# 2 10308   better oppurtunity                                        Corporate Solu~
# 3 11230   Moving on another organization for career persuade        Region Central~
# 4 13370   Get a new job outside the company.                        Region North   
# 5 14694   Health Issues                                             Finance Operat~

答案 1 :(得分:0)

鉴于您的格式严格来说就是您所显示的格式,另一个想法(可能有点过大)可能是

d1 <- df[c(TRUE, FALSE),]
d2 <- df[c(FALSE, TRUE),]

setNames(data.frame(d1[2], d1[4], d2[2]), c(d1[1,1], d1[1,3], d2[1,1]))

给出,

   Emp #:         Department:                                                       Reason of Resignation:
1  10282 Corporate Solutions I think below are areas of improvement within my team CS / SMEs outside JAZZ
2  10308 Corporate Solutions                                                           better oppurtunity
3  11230    Region Central A                           Moving on another organization for career persuade
4  13370        Region North                                           Get a new job outside the company.
5  14694  Finance Operations                                                                Health Issues