我是R编程语言的新手,我需要用R.can做mongodb数据分析,请你帮我实现。
注意:我在这里创建了"值"的新列表。 list by processing和custname columns.please引用数据框。
我的数据框
project Process custname column1 column2
analytics view jackson ZZ 2
analytics Read jackson KK 3
analytics Read jackson FF 4
analytics Read jackson KK 8
analytics Read ander MM 9
analytics Write jackson UU 5
analytics Write jackson UU 6
输出数据框。
Domain Process custname Fields values
analytics view jackson column1 list(colfield ="ZZ", "Totalcount"="1")
analytics view jackson column2 list(colfield ="2", "Totalcount"="1")
analytics Read jackson column1 list(colfield ="KK", "Totalcount"="2",list(colfield ="FF","Totalcount"="1"))
analytics Read jackson column2 list(colfield ="3", "Totalcount"="1",list(colfield ="4", "Totalcount"="1"), list(colfield ="8", "Totalcount"="1"))
analytics Read ander column1 list(colfield ="MM", "Totalcount"="1")
analytics Read ander column2 list(colfield ="9", "Totalcount"="1")
analytics Write jackson column1 list(colfield ="UU", "Totalcount"="2")
analytics Write jackson column2 list(colfield ="5", "Totalcount"="1",list(colfield ="6", "Totalcount"="1"))
Dput
structure(list(project = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = "analytics", class = "factor"), Process = structure(c(2L,
1L, 1L, 1L, 1L, 3L, 3L), .Label = c("Read", "view", "Write"), class = "factor"),
custname = structure(c(2L, 2L, 2L, 2L, 1L, 2L, 2L), .Label = c("ander",
"jackson"), class = "factor"), column1 = structure(c(5L,
2L, 1L, 2L, 3L, 4L, 4L), .Label = c("FF", "KK", "MM", "UU",
"ZZ"), class = "factor"), column2 = structure(c(1L, 2L, 3L,
6L, 7L, 4L, 5L), .Label = c("2", "3", "4", "5", "6", "8",
"9"), class = "factor")), .Names = c("project", "Process",
"custname", "column1", "column2"), row.names = c(NA, -7L), class = "data.frame")
答案 0 :(得分:0)
让我们试试这个
library(dplyr)
library(tidyr)
#df <- dput data
final_df <- df %>%
gather(Fields, colfield, -project, -Process, -custname) %>%
group_by(Process, custname, Fields, colfield ) %>%
mutate(Totalcount=n()) %>%
distinct() %>%
rowwise() %>%
mutate(Value_temp = list(cbind(colfield, Totalcount))) %>%
group_by(Process, custname, Fields) %>%
mutate(Value = list(cbind(Value_temp))) %>%
select(project, Process, custname, Fields, Value) %>%
distinct()
final_df