将匹配列表添加到dataframe

时间:2018-01-23 05:45:57

标签: r dataframe

我有两个看起来像这样的数据框:

付款数据框

id <- c("a","b","c","d","e","f")
total_amt <- c(100, 100, 200, 200, 350, 350)
payments <- data.frame(id, total_amt)

--------------------
|  id  | total_amt |
--------------------
|  a   |  100      |
|  b   |  100      |
|  c   |  200      |
|  d   |  200      |
|  e   |  350      |
|  f   |  350      |

价格数据框

product <- c("p1","p2","p3","p4","p5")
price <- c(100, 100, 300, 350, 350)
prices <- data.frame(product, price)

--------------------
|product|   price   |
--------------------
|  P1   |  100      |
|  P2   |  100      |
|  P3   |  300      |
|  P4   |  350      |
|  P5   |  350      |

我想创建另一个名为possible_match的列,其中包含具有相同total_amt的产品列表。结果数据框如下所示:

--------------------------------------
|  id  | total_amt | possible_match  |
--------------------------------------
|  a   |  100      |     p1,p2       |
|  b   |  100      |     p1,p2       |
|  c   |  200      |     NA          |
|  d   |  200      |     NA          |
|  e   |  350      |     p4,p5       |
|  f   |  350      |     p4,p5       |

我知道我可以创建一个与特定total_amt匹配的产品列表,如下所示:

prices[prices$price==350,]

但是如何将结果添加到payments数据框中的行?

我一直在寻找答案,但似乎没有找到类似的东西。

3 个答案:

答案 0 :(得分:1)

使用基本功能,您可以先使用merge加入2个表,然后按每个ID和total_amt组粘贴产品,如下所示

dat <- merge(payments, prices, by.x="total_amt", by.y="price", all.x=TRUE)
do.call(rbind, by(dat, paste(dat$id, dat$total_amt), function(x) {
    data.frame(
        id=x$id[1], 
        total_amt=x$total_amt[1],
        possible_match=paste(x$product, collapse=","))
}))

数据:

id <- c("a","b","c","d","e","f")
total_amt <- c(100, 100, 200, 200, 350, 350)
payments <- data.frame(id, total_amt)

product <- c("p1","p2","p3","p4","p5")
price <- c(100, 100, 300, 350, 350)
prices <- data.frame(product, price)

答案 1 :(得分:1)

以下是使用dplyr的选项。使用&#39;价格&#39;数据集,我们按&#39;价格&#39;,summarise&#39;产品&#39;通过paste元素然后left_join添加&#39;付款&#39;数据集

library(dplyr)
prices %>% 
    group_by(price) %>%
    summarise(product = toString(product)) %>% 
    left_join(payments, ., by = c(total_amt = 'price'))
#   id total_amt product
#1  a       100  p1, p2
#2  b       100  p1, p2
#3  c       200    <NA>
#4  d       200    <NA>
#5  e       350  p4, p5
#6  f       350  p4, p5

答案 2 :(得分:1)

使用aggregatematch在基地R中执行两步流程。我们首先按pricesprice进行分组,然后将所有products收集在一起,然后使用price agg_df match total_amt payments来自product,并将相应的possible_match值提供给agg_df <- aggregate(product~price, prices, toString) payments$possible_match <- agg_df$product[match(payments$total_amt, agg_df$price)] payments # id total_amt possible_match #1 a 100 p1, p2 #2 b 100 p1, p2 #3 c 200 <NA> #4 d 200 <NA> #5 e 350 p4, p5 #6 f 350 p4, p5

agg_df

其中 price product 1 100 p1, p2 2 300 p3 3 350 p4, p5

payments$possible_matches <- sapply(payments$total_amt, function(x) 
                             as.character(prices$product[prices$price %in% x]))

payments

#  id total_amt possible_matches
#1  a       100           p1, p2
#2  b       100           p1, p2
#3  c       200                 
#4  d       200                 
#5  e       350           p4, p5
#6  f       350           p4, p5

或者从不同角度做同样的事情:

var nodes = ["maria","mary","marks"];
insert_word(nodes);

function insert_word(split_nodes) {
var position = split_nodes.indexOf("Apple");
// insert into array logic 
}