我使用以下
从R中的Google Analytics(分析)获得了点击流数据public class SessionList extends AbstractListModel<Object> {
ArrayList<User> uList = new ArrayList<User>();
public void refresh() {
fireContentsChanged(this, 0, getSize());
}
public SessionList(ArrayList<User> users) {
this.uList = users;
fireContentsChanged(this, 0, getSize());
}
public void add(User element) {
if (uList.add(element)) {
fireContentsChanged(this, 0, getSize());
}
}
public void addAll(User elements[]) {
Collection<User> c = Arrays.asList(elements);
uList.addAll(c);
fireContentsChanged(this, 0, getSize());
}
public void clear() {
uList.clear();
fireContentsChanged(this, 0, getSize());
}
}
用户ID在整个数据集中都是重复的,因为用户ID将具有多个页面路径,时间戳甚至SessionID。
我想做的是找到一个包装或某种方式 将其放在一个可以与R中的Clickstream包一起使用的数据框中 因此结果将如下所示:
R中的哪个函数或程序包可以完成此任务。我不能使用
因为实际上有成千上万的用户ID和
路径public class User implements Serializable {
String name = "Guest";
int id;
Socket socket;
public User(String myname) {
this.name= myname;
}
public User(int id, String name,Socket socket) {
this.name = name;
this.socket = socket;
this.id = id;
}
}
我已经探索了public class User implements Serializable {
String name = "Guest";
int id;
Socket socket;
public User(String myname) {
this.name= myname;
}
public User(int id, String name,Socket socket) {
this.name = name;
this.socket = socket;
this.id = id;
}
}
函数和,但是还没有太多
祝您好运……必须有一种方法可以将
columns:
UserID, SessionID, TimeStamp, PagePath, PageViews
压缩为单行,然后
然后显示页面路径。
我尝试过UserID Column SessionID TimeStamp PagePath PageViews
1 1.1 12:01 google.com 1
1 1.1 12:03 google.com/products 1
1 1.1 12:06 google.com/info 1
1 1.1 12:08 google.com/purchase 1
2 2.1 09:07 google.com 1
2 2.1 09:13 google.com/info 1
和UserID PagePathBrokenOut
1 google.com,products,info
2 google.com,info
以及c(
,但都没有尝试过
到目前为止已经工作了
再次确定是否有一种方法可以将多个用户ID合并为一个单数 列分成1行,其中的各个路径都很棒。
我尝试使用data.frame
,但没有用
dplyr
答案 0 :(得分:0)
两种解决方案:
dplyr
library(dplyr)
dat %>%
mutate(Page = gsub("/.*", "", PagePath),
Path = trimws(gsub("^/|?$", "", gsub("^[^/]*", "", PagePath)))) %>%
group_by(UserID, Page) %>%
summarize(PagePathBrokenOut = paste(c(Page[1], Filter(nzchar, Path)), collapse = ",")) %>%
ungroup()
# # A tibble: 2 x 3
# UserID Page PagePathBrokenOut
# <int> <chr> <chr>
# 1 1 google.com google.com,products,info,purchase
# 2 2 google.com google.com,info
data.table
(注意:我使用magrittr
包只是为了打破通话管道,而不是要求这样做。不是。)
library(data.table)
library(magrittr)
datDT <- as.data.table(dat)
datDT %>%
.[, c("Page", "Path") := .(gsub("/.*", "", PagePath),
trimws(gsub("^/|?$", "", gsub("^[^/]*", "", PagePath)))), ] %>%
.[, .(PagePathBrokenOut = paste(c(Page[1], Filter(nzchar, Path)), collapse = ",")),
by = c("UserID", "Page")]
# UserID Page PagePathBrokenOut
# 1: 1 google.com google.com,products,info,purchase
# 2: 2 google.com google.com,info
数据:
dat <- read.table(header = TRUE, stringsAsFactors = FALSE, text = "
UserID SessionID TimeStamp PagePath PageViews
1 1.1 12:01 google.com 1
1 1.1 12:03 google.com/products 1
1 1.1 12:06 google.com/info 1
1 1.1 12:08 google.com/purchase 1
2 2.1 09:07 google.com 1
2 2.1 09:13 google.com/info 1 ")
答案 1 :(得分:0)
这是在某些情况下可能有用的另一种方法:
- if this looks like a digit, you can use
数据:
#Remove non required columns and spread
df2 <- df %>%
select(UserID, PagePath, PageViews) %>%
spread(PagePath, PageViews)
#Temporal vector to store UserIDs and remove it from df2
UserIDTemp <- df2$UserID
df2$UserID <- NULL
#Populate data frame with URLs instead of page views. NAs will be generated
w <- which(!is.na(df2), arr.ind = TRUE)
df2[w] <- names(df2)[w[, "col"]]
#Paste/concatenate all paths into a single string
df_args <- c(df2, sep = ", ")
pastedPaths <- do.call(paste, df_args)
#Create data frame with UserIDs and paths
PagePaths <- data.frame(UserIDTemp, pastedPaths)
data.frame(UserIDTemp,pastedPaths)
# UserIDTemp pastedPaths
# 1 google.com, google.com/info, google.com/products, google.com/purchase
# 2 google.com, google.com/info, NA, NA