如何将一列中的重复ID汇总为一行并显示结果

时间:2019-10-24 19:41:08

标签: r google-analytics

我使用以下

从R中的Google Analytics(分析)获得了点击流数据
public class SessionList extends AbstractListModel<Object> {

    ArrayList<User> uList = new ArrayList<User>();

public void refresh() {
    fireContentsChanged(this, 0, getSize());
}

public SessionList(ArrayList<User> users) {
    this.uList = users;
    fireContentsChanged(this, 0, getSize());
}

public void add(User element) {
    if (uList.add(element)) {
        fireContentsChanged(this, 0, getSize());
    }
}

public void addAll(User elements[]) {
    Collection<User> c = Arrays.asList(elements);
    uList.addAll(c);
    fireContentsChanged(this, 0, getSize());
}

public void clear() {
    uList.clear();
    fireContentsChanged(this, 0, getSize());
}


}

用户ID在整个数据集中都是重复的,因为用户ID将具有多个页面路径,时间戳甚至SessionID。

我想做的是找到一个包装或某种方式 将其放在一个可以与R中的Clickstream包一起使用的数据框中 因此结果将如下所示:

R中的哪个函数或程序包可以完成此任务。我不能使用

public class User implements Serializable {

    String name = "Guest";
    int id;
    Socket socket;

public User(String myname) {
    this.name= myname;
}
public User(int id, String name,Socket socket) {
    this.name = name;
    this.socket = socket;
    this.id = id;
}
  }
因为实际上有成千上万的用户ID和 路径

我已经探索了public class User implements Serializable { String name = "Guest"; int id; Socket socket; public User(String myname) { this.name= myname; } public User(int id, String name,Socket socket) { this.name = name; this.socket = socket; this.id = id; } } 函数和,但是还没有太多 祝您好运……必须有一种方法可以将columns: UserID, SessionID, TimeStamp, PagePath, PageViews 压缩为单行,然后 然后显示页面路径。

我尝试过UserID Column SessionID TimeStamp PagePath PageViews 1 1.1 12:01 google.com 1 1 1.1 12:03 google.com/products 1 1 1.1 12:06 google.com/info 1 1 1.1 12:08 google.com/purchase 1 2 2.1 09:07 google.com 1 2 2.1 09:13 google.com/info 1 UserID PagePathBrokenOut 1 google.com,products,info 2 google.com,info 以及c(,但都没有尝试过 到目前为止已经工作了

再次确定是否有一种方法可以将多个用户ID合并为一个单数 列分成1行,其中的各个路径都很棒。

我尝试使用data.frame,但没有用

dplyr

2 个答案:

答案 0 :(得分:0)

两种解决方案:

dplyr

library(dplyr)
dat %>%
  mutate(Page = gsub("/.*", "", PagePath),
         Path = trimws(gsub("^/|?$", "", gsub("^[^/]*", "", PagePath)))) %>%
  group_by(UserID, Page) %>%
  summarize(PagePathBrokenOut = paste(c(Page[1], Filter(nzchar, Path)), collapse = ",")) %>%
  ungroup()
# # A tibble: 2 x 3
#   UserID Page       PagePathBrokenOut                
#    <int> <chr>      <chr>                            
# 1      1 google.com google.com,products,info,purchase
# 2      2 google.com google.com,info                  

data.table

(注意:我使用magrittr包只是为了打破通话管道,而不是要求这样做。不是。)

library(data.table)
library(magrittr)
datDT <- as.data.table(dat)
datDT %>%
  .[, c("Page", "Path") := .(gsub("/.*", "", PagePath),
                             trimws(gsub("^/|?$", "", gsub("^[^/]*", "", PagePath)))), ] %>%
  .[, .(PagePathBrokenOut = paste(c(Page[1], Filter(nzchar, Path)), collapse = ",")),
    by = c("UserID", "Page")]
#    UserID       Page                 PagePathBrokenOut
# 1:      1 google.com google.com,products,info,purchase
# 2:      2 google.com                   google.com,info

数据:


dat <- read.table(header = TRUE, stringsAsFactors = FALSE, text = "
UserID    SessionID    TimeStamp   PagePath             PageViews
1           1.1          12:01      google.com             1
1           1.1          12:03      google.com/products    1
1           1.1          12:06      google.com/info        1
1           1.1          12:08      google.com/purchase    1 
2           2.1          09:07      google.com             1
2           2.1          09:13      google.com/info        1 ")

答案 1 :(得分:0)

这是在某些情况下可能有用的另一种方法:

- if this looks like a digit, you can use

数据:

#Remove non required columns and spread
df2 <- df %>%
  select(UserID, PagePath, PageViews) %>%
  spread(PagePath, PageViews)

#Temporal vector to store UserIDs and remove it from df2
UserIDTemp <- df2$UserID
df2$UserID <- NULL

#Populate data frame with URLs instead of page views. NAs will be generated
w <- which(!is.na(df2), arr.ind = TRUE)
df2[w] <- names(df2)[w[, "col"]]

#Paste/concatenate all paths into a single string
df_args <- c(df2, sep = ", ")
pastedPaths <-  do.call(paste, df_args)

#Create data frame with UserIDs and paths
PagePaths <- data.frame(UserIDTemp, pastedPaths)

data.frame(UserIDTemp,pastedPaths)
# UserIDTemp                                                  pastedPaths
# 1 google.com, google.com/info, google.com/products, google.com/purchase
# 2                                   google.com, google.com/info, NA, NA