计算R?中不同组的观测值的成对欧几里德距离。

时间:2019-04-15 10:56:03

标签: r

我有一个数据框,其中观察到的数据分为三个组(V1至V3):

   func splashImageForOrientation(orientation: UIInterfaceOrientation) -> String? {

    var viewSize = screenSize
    var viewOrientation = "Portrait"
    if orientation.isLandscape {
        viewSize = CGSize(width: viewSize.height, height: viewSize.width)
        viewOrientation = "Landscape"
    }
    if let infoDict = Bundle.main.infoDictionary, let launchImagesArray = infoDict["UILaunchImages"] as? [Any] {
        for launchImage in launchImagesArray {
            if let launchImage = launchImage as? [String: Any], let nameString = launchImage["UILaunchImageName"] as? String, let sizeString = launchImage["UILaunchImageSize"] as? String, let orientationString = launchImage["UILaunchImageOrientation"] as? String {
                let imageSize = NSCoder.cgSize(for: sizeString)
                if imageSize.equalTo(viewSize) && viewOrientation == orientationString {
                    return nameString
                }
            }
        }
    }
    return nil
}

我想计算观察值之间的欧几里得距离。计算所有观测值之间的成对距离很​​容易:

import boto3


session_src = boto3.session.Session(profile_name=<source_profile_name>)
source_s3_r = session_src.resource('s3')

session_dest = boto3.session.Session(profile_name=<dest_profile_name>)
dest_s3_r = session_dest.resource('s3')

# create a reference to source image
old_obj = source_s3_r.Object(<source_s3_bucket_name>, <prefix_path> + <key_name>)

# create a reference for destination image
new_obj = dest_s3_r.Object(<dest_s3_bucket_name>, old_obj.key)

# upload the image to destination S3 object
new_obj.put(Body=old_obj.get()['Body'].read())

但是我也有兴趣计算成对的距离(a)仅在同一组中的观察之间(b)在不属于同一组的观察之间(例如,第1组中的每个观察与第2和3组中的所有观察之间) )。

对于(a),我可以这样做:

  V1   V2   V3 group
0.59 0.78 0.91     1
0.72 0.91 0.73     2
1.31 1.21 0.90     3
4.32 1.53 3.20     2
....

并将结果相加;但我不确定如何完成(b)。最好怎么做?

谢谢!

2 个答案:

答案 0 :(得分:0)

如何在变量组合矩阵上应用与示例中类似的函数:

library(dplyr)

## define the data frame
df = as.data.frame(cbind(c(.59, .72, 1.31, 4.32),
           c(.78, .91, 1.21, 1.52),
           c(.91, .73, .9, 3.2),
           c(1,2,3,2)), stringsAsFactors = FALSE)

names(df) = c("V1", "V2", "V3", "group")

## generate a matrix with the unique combinations of groups
combinations = combn(x = unique(df$group), m = 2)

## apply a function over the matrix of group combinations to determine
## the distance between the variable observations
distlist = lapply(seq(from = 1, to = ncol(combinations)), function(i){

  tmpdist = df %>% filter(group %in% combinations[,i]) %>%
    select(-group) %>%
    dist()

  return(cbind(combinations[1,i], combinations[2,i], tmpdist))

})

## combine the list into a dataframe 
dists = do.call(rbind, distlist)

names(dists) = c("group1", "group2", "dist")

答案 1 :(得分:0)

这是一种在给定条件下拆分距离计算和提取的方法。

##  distance as a matrix
d_m <- df %>% 
  select(-group) %>% 
  dist() %>% 
  as.matrix()

##  combination of groups
cb_g <- combn(df$group, m= 2)
##  combination of indices
cb_i <- combn(1:length(df$group), m= 2) 

##  extract the values that fit to given conditions
corr_same_grp <- apply(cb_g, 2, function(x) x[1] == x[2]) %>%  # same groups
  { cb_i[, ., drop= F] } %>%           # get indices
  apply(2, function(x) d_m[x[2], x[1]])

corr_diff_grp <- apply(cb_g, 2, function(x) x[1] != x[2]) %>%  # different groups 
  { cb_i[, ., drop= F] } %>%           # get indices
  apply(2, function(x) d_m[x[2], x[1]])