如何在R中动态地从现有列值创建新的列值列表

时间:2017-10-05 18:05:53

标签: r

我是R编程语言的新手,我需要用R.can做mongodb数据分析,请你帮我实现。

注意:我在这里创建了"值"的新列表。 list by processing和custname columns.please引用数据框。

我的数据框

project            Process      custname        column1   column2

    analytics          view       jackson             ZZ       2                                                       
    analytics          Read       jackson             KK       3
    analytics          Read       jackson             FF       4
    analytics          Read       jackson             KK       8                                                       
    analytics          Read       ander               MM       9                                                     
    analytics          Write      jackson             UU       5
    analytics          Write      jackson             UU       6

输出数据框。

Domain           Process      custname      Fields                values             

    analytics          view       jackson       column1               list(colfield ="ZZ", "Totalcount"="1") 
    analytics          view       jackson       column2               list(colfield ="2", "Totalcount"="1")
    analytics          Read       jackson       column1               list(colfield ="KK", "Totalcount"="2",list(colfield ="FF","Totalcount"="1"))  
    analytics          Read       jackson       column2               list(colfield ="3", "Totalcount"="1",list(colfield ="4", "Totalcount"="1"), list(colfield ="8", "Totalcount"="1"))
    analytics          Read       ander         column1               list(colfield ="MM", "Totalcount"="1")
    analytics          Read       ander         column2               list(colfield ="9", "Totalcount"="1")
    analytics          Write      jackson       column1               list(colfield ="UU", "Totalcount"="2")
    analytics          Write      jackson       column2               list(colfield ="5", "Totalcount"="1",list(colfield ="6", "Totalcount"="1"))

Dput

structure(list(project = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L), .Label = "analytics", class = "factor"), Process = structure(c(2L, 
1L, 1L, 1L, 1L, 3L, 3L), .Label = c("Read", "view", "Write"), class = "factor"), 
    custname = structure(c(2L, 2L, 2L, 2L, 1L, 2L, 2L), .Label = c("ander", 
    "jackson"), class = "factor"), column1 = structure(c(5L, 
    2L, 1L, 2L, 3L, 4L, 4L), .Label = c("FF", "KK", "MM", "UU", 
    "ZZ"), class = "factor"), column2 = structure(c(1L, 2L, 3L, 
    6L, 7L, 4L, 5L), .Label = c("2", "3", "4", "5", "6", "8", 
    "9"), class = "factor")), .Names = c("project", "Process", 
"custname", "column1", "column2"), row.names = c(NA, -7L), class = "data.frame")

1 个答案:

答案 0 :(得分:0)

让我们试试这个

library(dplyr)
library(tidyr)

#df <- dput data
final_df <- df %>% 
  gather(Fields, colfield, -project, -Process, -custname) %>%
  group_by(Process, custname, Fields, colfield ) %>%
  mutate(Totalcount=n()) %>%
  distinct() %>% 
  rowwise() %>%
  mutate(Value_temp = list(cbind(colfield, Totalcount))) %>%
  group_by(Process, custname, Fields) %>%
  mutate(Value = list(cbind(Value_temp))) %>%
  select(project, Process, custname, Fields, Value) %>%
  distinct()

final_df