我有一个带有一列小标题的数据框。 这是我的一部分数据:
<!DOCTYPE html>
<meta charset="utf-8">
<style>
* {
margin: 0;
padding: 0;
}
html, body {
width: 100%;
height: 100%;
border: 0;
overflow: hidden;
display: block;
}
.box, .reset, .x, .y, .board {
display: block;
}
.box {
position: relative;
width: 100%;
height: 100%;
}
.reset {
position: absolute;
left: 0;
top: 0;
width: 20px;
height: 20px;
border-right: 1px solid #E5E5E5;
border-bottom: 1px solid #E5E5E5;
background-color: #FCFCFC;
z-index: 100;
}
.x {
position: absolute;
left: 20px;
top: 0;
width: calc(100% - 20px);
height: 20px;
background-color: #FCFCFC;
}
.y {
position: absolute;
left: 0;
top: 20px;
width: 20px;
height: calc(100% - 20px);
background-color: #FCFCFC;
}
.board {
position: absolute;
left: 0;
top: 0;
width: calc(100% - 20px);
height: calc(100% - 20px);
}
#board {
padding-left: 20px;
padding-top: 20px;
}
body {
font: 10px sans-serif;
shape-rendering: crispEdges;
background-color: #E5E5E5;
}
path.domain {
stroke: none;
}
g.tick line {
stroke: #D3D3D3;
stroke-width: 2;
}
g.tick text {
fill: #C4C4C4;
}
</style>
<body>
<div class="box">
<div class="reset"></div>
<div class="x"></div>
<div class="y"></div>
<div class="board"></div>
</div>
<script src="https://d3js.org/d3.v5.min.js"></script>
<script>
function isInteger(num) {
return (num ^ 0) === num;
}
var width = window.innerWidth
|| document.documentElement.clientWidth
|| document.body.clientWidth;
var height = window.innerHeight
|| document.documentElement.clientHeight
|| document.body.clientHeight;
var svg = d3.select(".board").append("svg")
.attr("id", "board")
.attr("width", width - 20)
.attr("height", height - 20)
// .append("g");
var zoom = d3.zoom()
// .scaleExtent([1, 32])
.on("zoom", zoomed);
svg.call(zoom);
x_scale = d3.scaleLinear().domain([0, 20]).range([0, width]);
y_scale = d3.scaleLinear().domain([0, 20]).range([0, height]);
var x_axis = d3.axisTop(x_scale)
.ticks(?) //I dont know what do I have to write
.tickFormat(function(d, i) {
if (isInteger(d)) {
return d;
}
})
.tickSize(5)
var x_axis_group = svg.append("g")
.attr("transform", "translate(20, 0)")
.call(x_axis);
var y_axis = d3.axisLeft(y_scale)
.ticks(?) /I dont know what do I have to write
.tickFormat(function(d, i) {
if (isInteger(d)) {
return d;
}
})
.tickSize(5);
var y_axis_group = svg.append("g")
.attr("transform", "translate(0, 20)")
.call(y_axis);
function zoomed() {
var new_x_scale = d3.event.transform.rescaleX(x_scale);
var new_y_scale = d3.event.transform.rescaleY(y_scale);
x_axis_group.call(x_axis.scale(new_x_scale));
y_axis_group.call(y_axis.scale(new_y_scale));
}
</script>
这是我的功能:
date time uuid data
2018-06-23 18:25:24 0b27ea5fad61c99d <tibble>
2018-06-23 18:25:38 0b27ea5fad61c99d <tibble>
2018-06-23 18:26:01 0b27ea5fad61c99d <tibble>
2018-06-23 18:26:23 0b27ea5fad61c99d <tibble>
2018-06-23 18:26:37 0b27ea5fad61c99d <tibble>
2018-06-23 18:27:00 0b27ea5fad61c99d <tibble>
2018-06-23 18:27:22 0b27ea5fad61c99d <tibble>
2018-06-23 18:27:39 0b27ea5fad61c99d <tibble>
2018-06-23 18:28:06 0b27ea5fad61c99d <tibble>
2018-06-23 18:28:30 0b27ea5fad61c99d <tibble>
我的数据列由带有一列字符的小标题组成:
jaccard <- function(vector1, vector2) {
return(length(intersect(vector1, vector2)) /
length(union(vector1, vector2)))
}
我的目标是在数据列中的每两个连续小节之间计算jaccard。
我尝试过:
contacts
5646
65748
115
498456
35135
,但由于某些原因似乎无效。
我知道我已经接近了,请告知。
答案 0 :(得分:1)
原因是jaccard
函数未编写为处理矢量参数。如您所知,用作mutate
一部分的函数接收数据向量(在OP的示例中为10 tibbles
的向量)。现在,由于jaccard
函数未编写为处理vector(小节的向量)的参数,因此结果将不符合预期。
最简单的解决方法是对jaccard
函数进行矢量化处理,使其可以处理矢量参数。一次可以使用Vectorize
将函数转换为:
# Function
jaccard <- function(vector1, vector2) {
return(length(intersect(vector1, vector2)) /
length(union(vector1, vector2)))
}
# Vectorised version of jaccard function
jaccardV <- Vectorize(jaccard)
library(dplyr)
df %>%
mutate(j = jaccardV(data, lag(data, 1)))
# date time uuid data j
# 1 2018-06-23 18:25:24 0b27ea5fad61c99d 5646, 65748, 115, 498456, 35135 0.0000000
# 2 2018-06-23 18:25:38 0b27ea5fad61c99d 5646, 65748 0.4000000
# 3 2018-06-23 18:26:01 0b27ea5fad61c99d 5646, 65748, 115 0.6666667
# 4 2018-06-23 18:26:23 0b27ea5fad61c99d 5646 0.3333333
# 5 2018-06-23 18:26:37 0b27ea5fad61c99d 5646, 65748 0.5000000
# 6 2018-06-23 18:27:00 0b27ea5fad61c99d 5646, 65748, 115, 498456, 35135 0.4000000
# 7 2018-06-23 18:27:22 0b27ea5fad61c99d 5646, 65748 0.4000000
# 8 2018-06-23 18:27:39 0b27ea5fad61c99d 5646, 65748, 115 0.6666667
# 9 2018-06-23 18:28:06 0b27ea5fad61c99d 5646 0.3333333
# 10 2018-06-23 18:28:30 0b27ea5fad61c99d 5646, 65748 0.5000000
数据:
df <- read.table(text="
date time uuid
2018-06-23 18:25:24 0b27ea5fad61c99d
2018-06-23 18:25:38 0b27ea5fad61c99d
2018-06-23 18:26:01 0b27ea5fad61c99d
2018-06-23 18:26:23 0b27ea5fad61c99d
2018-06-23 18:26:37 0b27ea5fad61c99d
2018-06-23 18:27:00 0b27ea5fad61c99d
2018-06-23 18:27:22 0b27ea5fad61c99d
2018-06-23 18:27:39 0b27ea5fad61c99d
2018-06-23 18:28:06 0b27ea5fad61c99d
2018-06-23 18:28:30 0b27ea5fad61c99d",
header = TRUE, stringsAsFactors = FALSE)
t1 <- tibble(contacts = c(5646,65748,115,498456,35135))
t2 <- tibble(contacts = c(5646,65748))
t3 <- tibble(contacts = c(5646,65748,115))
t4 <- tibble(contacts = c(5646))
t5 <- tibble(contacts = c(5646,65748))
df$data <- c(t1,t2,t3,t4,t5)
df
# date time uuid data
# 1 2018-06-23 18:25:24 0b27ea5fad61c99d 5646, 65748, 115, 498456, 35135
# 2 2018-06-23 18:25:38 0b27ea5fad61c99d 5646, 65748
# 3 2018-06-23 18:26:01 0b27ea5fad61c99d 5646, 65748, 115
# 4 2018-06-23 18:26:23 0b27ea5fad61c99d 5646
# 5 2018-06-23 18:26:37 0b27ea5fad61c99d 5646, 65748
# 6 2018-06-23 18:27:00 0b27ea5fad61c99d 5646, 65748, 115, 498456, 35135
# 7 2018-06-23 18:27:22 0b27ea5fad61c99d 5646, 65748
# 8 2018-06-23 18:27:39 0b27ea5fad61c99d 5646, 65748, 115
# 9 2018-06-23 18:28:06 0b27ea5fad61c99d 5646
# 10 2018-06-23 18:28:30 0b27ea5fad61c99d 5646, 65748