假设我有一对称为“边缘”的数据帧,比如说:
x0 y0 x1 y1
1 2.464286 2.464286 2.583333 1.750000
2 0.700000 3.787500 2.464286 2.464286
3 2.464286 2.464286 3.500000 3.500000
4 3.500000 3.500000 4.300000 3.900000
5 2.250000 4.750000 3.500000 3.500000
数据帧的每一行是从点(x0,y0)到点(x1,y1)的边缘,例如,我的第一个边缘从坐标点(2.464286,2.464286)到点(2.583333,1.750000)
从该数据框中,我可以轻松地提取另一个数据帧,称之为“顶点”,其中每个点只出现一次:
x y
1 2.464286 2.464286
2 0.700000 3.787500
3 3.500000 3.500000
4 2.250000 4.750000
5 2.583333 1.750000
6 4.300000 3.900000
如何用数据框“边缘”中出现的行号标记“顶点”中的每个点,无差别地将其作为左端点或右端点?也就是说,我想得到这样的东西:
x y occurrences
1 2.464286 2.464286 1,2,3
2 0.700000 3.787500 2
3 3.500000 3.500000 3,4,5
4 2.250000 4.750000 5
5 2.583333 1.750000 1
6 4.300000 3.900000 4
我试图在%中使用%,但是它只考虑元素比较,因此具有相同x坐标或y坐标的两个点可以被认为是相同的。
另外,这是一个标准化,我在模拟中必须做很多次,所以我希望得到一个比for-loop / if based更好的解决方案。
答案 0 :(得分:2)
以下是使用dplyr
的解决方案。可能有一种方法可以清理它,但这应该可以让你在那里大部分时间。
library(dplyr)
edgedf <- read.table(header = TRUE,stringsAsFactors = FALSE, text = "
x0 y0 x1 y1
2.464286 2.464286 2.583333 1.750000
0.700000 3.787500 2.464286 2.464286
2.464286 2.464286 3.500000 3.500000
3.500000 3.500000 4.300000 3.900000
2.250000 4.750000 3.500000 3.500000")
vertdf <- read.table(header = TRUE,stringsAsFactors = FALSE, text = "
x y
2.464286 2.464286
0.700000 3.787500
3.500000 3.500000
2.250000 4.750000
2.583333 1.750000
4.300000 3.900000")
# Add row numbers
tmp_edgedf <- edgedf %>% mutate(id = 1:n())
# Stack the x0,y0 and x1,y1 coords as x,y then join
# with vertices "vertdf". Grouping by x,y and summarise
# concatenating the row numbers as occurrences.
rbind(tmp_edgedf %>%
select(id, x0, y0) %>%
rename(x = x0, y = y0),
tmp_edgedf %>%
select(id, x1, y1) %>%
rename(x = x1, y = y1)) %>%
right_join(vertdf, by = c("x", "y")) %>%
group_by(x, y) %>%
summarise(occurrences = paste(sort(id), collapse = ",")) %>%
data.frame() # Remove rounding by tibble object.
结果
## x y occurrences
## 1 0.700000 3.787500 2
## 2 2.250000 4.750000 5
## 3 2.464286 2.464286 1,2,3
## 4 2.583333 1.750000 1
## 5 3.500000 3.500000 3,4,5
## 6 4.300000 3.900000 4
修改强>
下面是一个变体,也许更简单的解决方案。第一个inner_join
将顶点连接到(x0, y0)
,第二个连接到(x1, y1)
。行号被添加到edgedf
数据结构(临时),跟踪行号。 edgedf
数据框可以在加入之前添加它,从而消除了重复添加。
rbind(
inner_join(vertdf,
edgedf %>% transmute(id = 1:n(), x0, y0),
by = c(x = "x0", y = "y0")),
inner_join(vertdf,
edgedf %>% transmute(id = 1:n(), x1, y1),
by = c(x = "x1", y = "y1"))
) %>%
group_by(x,y) %>%
summarise(occurrances = paste(sort(id), collapse = ",")) %>%
data.frame()
答案 1 :(得分:1)
希望这有帮助!
library(dplyr)
edges %>%
rowwise() %>%
mutate(occurrences = paste(rownames(vertices)[unlist(lapply(apply(vertices, 1, paste, collapse=","),
function(i) grepl(paste(x, y, sep=','), i)))], collapse = ",")) %>%
data.frame()
输出为:
x y occurrences
1 2.464286 2.464286 1,2,3
2 0.700000 3.787500 2
3 3.500000 3.500000 3,4,5
4 2.250000 4.750000 5
5 2.583333 1.750000 1
6 4.300000 3.900000 4
示例数据:
edges <- structure(list(x = c(2.464286, 0.7, 3.5, 2.25, 2.583333, 4.3),
y = c(2.464286, 3.7875, 3.5, 4.75, 1.75, 3.9)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
vertices <- structure(list(x0 = c(2.464286, 0.7, 2.464286, 3.5, 2.25), y0 = c(2.464286,
3.7875, 2.464286, 3.5, 4.75), x1 = c(2.583333, 2.464286, 3.5,
4.3, 3.5), y1 = c(1.75, 2.464286, 3.5, 3.9, 3.5)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5"))
答案 2 :(得分:0)
这是一种不需要dplyr
的单行方法:
vertices[, 'occurrences'] <- apply(vertices, 1, function(V)
paste(which(apply(edges, 1, function (E, V)
isTRUE(all.equal(V, E[1:2], check.attributes=FALSE)) ||
isTRUE(all.equal(V, E[3:4], check.attributes=FALSE)), V=V)),
collapse=',')
)
代码依次获取vertices
的每一行,然后在edges
的每一行中检查匹配,依次检查该行的每一行。 isTRUE
必须将比较结果剥离为简单的“匹配与否”; which
将TRUE
s和FALSE
的字符串转换为与这些行对应的整数,paste
将这一系列整数转换为以逗号分隔的字符串。
vertices<- structure(list(
x = c(2.464286, 0.7, 3.5, 2.25, 2.583333, 4.3),
y = c(2.464286, 3.7875, 3.5, 4.75, 1.75, 3.9)),
class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6")
)
edges <- structure(list(
x0 = c(2.464286, 0.7, 2.464286, 3.5, 2.25),
y0 = c(2.464286, 3.7875, 2.464286, 3.5, 4.75),
x1 = c(2.583333, 2.464286, 3.5, 4.3, 3.5),
y1 = c(1.75, 2.464286, 3.5, 3.9, 3.5)),
class = "data.frame",
row.names = c("1", "2", "3", "4", "5")
)
> vertices
x y occurrences
1 2.464286 2.464286 1,2,3
2 0.700000 3.787500 2
3 3.500000 3.500000 3,4,5
4 2.250000 4.750000 5
5 2.583333 1.750000 1
6 4.300000 3.900000 4