如何为r中两个数据帧之间的匹配观察分配相同的唯一ID?

时间:2015-05-13 21:33:15

标签: r matching uniqueidentifier


当我有两个(或更多)数据框并希望为每个数据集中的每个匹配观察分配唯一ID时,我有一个实际问题,例如:

    scope.search = function() {
      // If searchText empty, don't search
      if (scope.searchText == null || scope.searchText.length < 1)
        return;

      var url = 'http://suggestqueries.google.com/complete/search?';
      url += 'callback=JSON_CALLBACK&client=firefox&hl=en&q=' 
      url += encodeURIComponent(scope.searchText);
      $http.defaults.useXDomain = true;

      $http({
        url: url,
        method: 'JSONP',
        headers: {
          'Access-Control-Allow-Origin': '*',
          'Access-Control-Allow-Methods': 'POST, GET, OPTIONS, PUT',
          'Content-Type': 'application/json',
          'Accept': 'application/json'

        }
      }).
      success(function(data, status, headers, config) {

        // Api returns [ Original Keyword, Searches[] ]
        var results = data[1];
        if (results.indexOf(scope.searchText) === -1) {
          data.unshift(scope.searchText);
        }
        scope.suggestions = results;
        scope.selectedIndex = -1;
      }).
      error(function(data, status, headers, config) {
        console.log('fail');
        // called asynchronously if an error occurs
        // or server returns response with an error status.
      });

非常感谢有关如何访问df2.2的任何帮助。谢谢。

3 个答案:

答案 0 :(得分:5)

解决这个问题的一个简单方法是制作哈希:

library(dplyr)
library(digest)

df1 %>%
  rowwise() %>%
  do( data.frame(., id=digest( paste(.$a1,.$b1,.$c1), algo="md5"),
                   stringsAsFactors=FALSE)) %>% ungroup()

df2 %>%
  rowwise() %>%
  do( data.frame(., id=digest( paste(.$a2,.$b2,.$c2), algo="md5"),
               stringsAsFactors=FALSE)) %>% ungroup()

会为df1生成以下内容:

   a1 b1     c1                               id
1   1  1  white b86fbb78b27f7db2ee50af2d68cce452
2   1  5    red 68d47f544832989834517630e4a2764c
3   1  3  black 724e37192140cb2009cf3d982f2be1e4
4   1  2  white f731b8b38255b8c312543283f8e1c634
5   2  3    red 2d50b86902056a51faad04d2c566faf2
6   2  4  white 9396667cd51d1e1b61b0b22a7767d3d9
7   2  5  black 9ba1f3e04c61c006d3c5382fcad098e6
8   2  1 silver 38dcd29d200c8b33cd38ac78ef9dd751
9   1  5    red 68d47f544832989834517630e4a2764c
10  1  2  green 7d9b1aadfd79de142b234b83d7867b9b

以及df2的以下内容:

   a2 b2     c2                               id
1   2  3  black d285febc8ab08e99b11609b98f077e66
2   2  1   blue bfa0405276406ac4bc596daf957dfa11
3   1  3  black 724e37192140cb2009cf3d982f2be1e4
4   1  2  white f731b8b38255b8c312543283f8e1c634
5   2  1 silver 38dcd29d200c8b33cd38ac78ef9dd751
6   2  3  green 67eefe9ee2d82486ded30a268289296b
7   2  4  green d773f58cf144eab15ef459e326494a2f
8   2  5    red 0724318a9f59d3960edfe4e90f9c4eff
9   2  3   blue 6883420cc137ba45b773f642176e9ce6
10  2  5  white 5dea9e63b5fbfb31fb81260cb5a5f41c

答案 1 :(得分:0)

您可以通过编写生成唯一ID的函数,然后将其应用于df1df2的组合来完成您想要的任务。

# Inspiration: http://stackoverflow.com/questions/24119599/how-to-assign-a-unique-id-number-to-each-group-of-identical-values-in-a-column
unique.id <- function(x) as.numeric(factor(x))

(df1.info <- do.call(paste, df1))
#  [1] "1 1 white 1"  "1 5 red 5"    "1 3 black 4"  "1 2 white 3"  "2 3 red 11"  
#  [6] "2 4 white 13" "2 5 black 14" "2 1 silver 7" "1 5 red 5"    "1 2 green 2" 
df2.info <- do.call(paste, df2)
ids <- unique.id(c(df1.info, df2.info))
df1$id <- head(ids, nrow(df1))
df1
#    a1 b1     c1 id
# 1   1  1  white  1
# 2   1  5    red  5
# 3   1  3  black  4
# 4   1  2  white  3
# 5   2  3    red 11
# 6   2  4  white 13
# 7   2  5  black 14
# 8   2  1 silver  7
# 9   1  5    red  5
# 10  1  2  green  2
df2$id <- tail(ids, nrow(df2))
df2
#    a2 b2     c2 id
# 1   2  3  black  8
# 2   2  1   blue  6
# 3   1  3  black  4
# 4   1  2  white  3
# 5   2  1 silver  7
# 6   2  3  green 10
# 7   2  4  green 12
# 8   2  5    red 15
# 9   2  3   blue  9
# 10  2  5  white 16

答案 2 :(得分:0)

假设您的列完全相同,最简单的方法可能是:

df.all <- rbind(df1, df2)

(您可能需要将列重命名为相同。)

现在在整个数据集中执行与数据表相同的技巧。然后重新分割数据集:

df1 <- df.all[1:nrow(df1),]
df2 <- df.all[- (1:nrow(df1)),]

注意:我不是说数据表技巧是为独特组合生成数字的理想方式!但是你已经写出来了。