R - 在坐标对的数据帧中获取坐标对的行数

时间:2018-04-27 15:51:43

标签: r dataframe

假设我有一对称为“边缘”的数据帧,比如说:

  x0       y0       x1       y1
1 2.464286 2.464286 2.583333 1.750000
2 0.700000 3.787500 2.464286 2.464286
3 2.464286 2.464286 3.500000 3.500000
4 3.500000 3.500000 4.300000 3.900000
5 2.250000 4.750000 3.500000 3.500000

数据帧的每一行是从点(x0,y0)到点(x1,y1)的边缘,例如,我的第一个边缘从坐标点(2.464286,2.464286)到点(2.583333,1.750000)

从该数据框中,我可以轻松地提取另一个数据帧,称之为“顶点”,其中每个点只出现一次:

  x        y
1 2.464286 2.464286
2 0.700000 3.787500
3 3.500000 3.500000
4 2.250000 4.750000
5 2.583333 1.750000
6 4.300000 3.900000

如何用数据框“边缘”中出现的行号标记“顶点”中的每个点,无差别地将其作为左端点或右端点?也就是说,我想得到这样的东西:

  x        y            occurrences
1 2.464286 2.464286     1,2,3
2 0.700000 3.787500     2
3 3.500000 3.500000     3,4,5
4 2.250000 4.750000     5
5 2.583333 1.750000     1
6 4.300000 3.900000     4

我试图在%中使用%,但是它只考虑元素比较,因此具有相同x坐标或y坐标的两个点可以被认为是相同的。

另外,这是一个标准化,我在模拟中必须做很多次,所以我希望得到一个比for-loop / if based更好的解决方案。

3 个答案:

答案 0 :(得分:2)

以下是使用dplyr的解决方案。可能有一种方法可以清理它,但这应该可以让你在那里大部分时间。

library(dplyr)

edgedf <- read.table(header = TRUE,stringsAsFactors = FALSE, text = "
x0       y0       x1       y1
2.464286 2.464286 2.583333 1.750000
0.700000 3.787500 2.464286 2.464286
2.464286 2.464286 3.500000 3.500000
3.500000 3.500000 4.300000 3.900000
2.250000 4.750000 3.500000 3.500000")


vertdf <- read.table(header = TRUE,stringsAsFactors = FALSE, text = "
x        y
2.464286 2.464286
0.700000 3.787500
3.500000 3.500000
2.250000 4.750000
2.583333 1.750000
4.300000 3.900000")

# Add row numbers
tmp_edgedf <- edgedf %>% mutate(id = 1:n())
# Stack the x0,y0 and x1,y1 coords as x,y then join
# with vertices "vertdf". Grouping by x,y and summarise
# concatenating the row numbers as occurrences.
rbind(tmp_edgedf %>%
        select(id, x0, y0) %>%
        rename(x = x0, y = y0),
      tmp_edgedf %>%
        select(id, x1, y1) %>%
        rename(x = x1, y = y1)) %>%
  right_join(vertdf, by = c("x", "y")) %>%
  group_by(x, y) %>%
  summarise(occurrences = paste(sort(id), collapse = ",")) %>%
  data.frame() # Remove rounding by tibble object.

结果

##          x        y occurrences
## 1 0.700000 3.787500           2
## 2 2.250000 4.750000           5
## 3 2.464286 2.464286       1,2,3
## 4 2.583333 1.750000           1
## 5 3.500000 3.500000       3,4,5
## 6 4.300000 3.900000           4

修改

下面是一个变体,也许更简单的解决方案。第一个inner_join将顶点连接到(x0, y0),第二个连接到(x1, y1)。行号被添加到edgedf数据结构(临时),跟踪行号。 edgedf数据框可以在加入之前添加它,从而消除了重复添加。

rbind(
    inner_join(vertdf, 
               edgedf %>% transmute(id = 1:n(), x0, y0),
               by = c(x = "x0", y = "y0")),
    inner_join(vertdf,
               edgedf %>% transmute(id = 1:n(), x1, y1),
               by = c(x = "x1", y = "y1"))
  ) %>%
  group_by(x,y) %>%
  summarise(occurrances = paste(sort(id), collapse = ",")) %>%
  data.frame()

答案 1 :(得分:1)

希望这有帮助!

library(dplyr)

edges %>%
  rowwise() %>%
  mutate(occurrences = paste(rownames(vertices)[unlist(lapply(apply(vertices, 1, paste, collapse=","), 
                                  function(i) grepl(paste(x, y, sep=','), i)))], collapse = ",")) %>%
  data.frame()

输出为:

         x        y occurrences
1 2.464286 2.464286       1,2,3
2 0.700000 3.787500           2
3 3.500000 3.500000       3,4,5
4 2.250000 4.750000           5
5 2.583333 1.750000           1
6 4.300000 3.900000           4

示例数据:

edges <- structure(list(x = c(2.464286, 0.7, 3.5, 2.25, 2.583333, 4.3), 
    y = c(2.464286, 3.7875, 3.5, 4.75, 1.75, 3.9)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

vertices <- structure(list(x0 = c(2.464286, 0.7, 2.464286, 3.5, 2.25), y0 = c(2.464286, 
3.7875, 2.464286, 3.5, 4.75), x1 = c(2.583333, 2.464286, 3.5, 
4.3, 3.5), y1 = c(1.75, 2.464286, 3.5, 3.9, 3.5)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

答案 2 :(得分:0)

这是一种不需要dplyr的单行方法:

vertices[, 'occurrences'] <- apply(vertices, 1, function(V) 
  paste(which(apply(edges, 1, function (E, V) 
    isTRUE(all.equal(V, E[1:2], check.attributes=FALSE)) || 
    isTRUE(all.equal(V, E[3:4], check.attributes=FALSE)), V=V)),
  collapse=',')
)

代码依次获取vertices的每一行,然后在edges的每一行中检查匹配,依次检查该行的每一行。 isTRUE必须将比较结果剥离为简单的“匹配与否”; whichTRUE s和FALSE的字符串转换为与这些行对应的整数,paste将这一系列整数转换为以逗号分隔的字符串。

样本数据

vertices<- structure(list(
    x = c(2.464286, 0.7, 3.5, 2.25, 2.583333, 4.3), 
    y = c(2.464286, 3.7875, 3.5, 4.75, 1.75, 3.9)),
    class = "data.frame", 
    row.names = c("1", "2", "3", "4", "5", "6")
)

edges <- structure(list(
   x0 = c(2.464286, 0.7, 2.464286, 3.5, 2.25),
    y0 = c(2.464286, 3.7875, 2.464286, 3.5, 4.75),
    x1 = c(2.583333, 2.464286, 3.5, 4.3, 3.5),
    y1 = c(1.75, 2.464286, 3.5, 3.9, 3.5)),
    class = "data.frame", 
    row.names = c("1", "2", "3", "4", "5")
)

输出:

> vertices

         x        y occurrences
1 2.464286 2.464286       1,2,3
2 0.700000 3.787500           2
3 3.500000 3.500000       3,4,5
4 2.250000 4.750000           5
5 2.583333 1.750000           1
6 4.300000 3.900000           4