Question

我想这是最简单的任务之一，但我不知道该怎么做。我有一个包含大量数据的data.frame，一个带有帐号的收据号码。

ACCOUNT RECEIPT
10000   2121
12000   1515
21000   2121
50200   1515
47500   1474
90000   1474

我现在想要搜索第一个收据编号并列出新列中的所有帐号。我目前还不确定结果是否也应该列出当前收据的帐户，因为无论如何都可以看到它。

ACCOUNT RECEIPT RESULT
  10000    2121 21000        
  12000    1515 50200
  21000    2121 10000
  50200    1515 12000
  47500    1474 90000, 140000
  90000    1474 47500, 140000
 140000    1474 47500, 90000

我真的很喜欢使用dplyr，也许它已经可以用它做了，我只是看不到解决方案..

Answer 1

我认为您的决赛桌中每个RECEIPT都不需要多行。我建议这个解决方案

library(dplyr)

dt = read.table(text="ACCOUNT RECEIPT
10000   2121
12000   1515
21000   2121
50200   1515
47500   1474
90000   1474", header=T)

dt %>%
  group_by(RECEIPT) %>%
  summarise(ALL_ACCOUNTS = paste(ACCOUNT, collapse = ", "))

# # A tibble: 3 x 2
#   RECEIPT ALL_ACCOUNTS
#     <int>        <chr>
# 1    1474 47500, 90000
# 2    1515 12000, 50200
# 3    2121 10000, 21000

正如您所提到的，每个RECEIPT值得到一行，然后是所有相应的ACCOUNT值。

要完全按照您在问题中提到的内容进行操作，请尝试使用

library(dplyr)

dt = read.table(text="ACCOUNT RECEIPT
                10000   2121
                12000   1515
                21000   2121
                50200   1515
                47500   1474
                90000   1474
                140000  1474", header=T)

dt %>%
  left_join(dt, by="RECEIPT") %>%            # join same dataset to get all combinations of accounts
  filter(ACCOUNT.x != ACCOUNT.y) %>%         # filter out cases with same account numbers
  group_by(ACCOUNT.x, RECEIPT) %>%           # group by pairs of first account number and receipt
  summarise(REST_ACCOUNTS = paste(ACCOUNT.y, collapse = ", ")) %>%   # combine rest of account numbers
  ungroup() %>%                              # forget the grouping
  arrange(RECEIPT) %>%                       # order by receipt (only if needed for better visualisation)
  rename(ACCOUNT = ACCOUNT.x)                # change the name (only if needed for better visualisation)

# # A tibble: 7 x 3
#   ACCOUNT RECEIPT REST_ACCOUNTS
#     <int>   <int>         <chr>
# 1   47500    1474 90000, 140000
# 2   90000    1474 47500, 140000
# 3  140000    1474  47500, 90000
# 4   12000    1515         50200
# 5   50200    1515         12000
# 6   10000    2121         21000
# 7   21000    2121         10000

将数据框的搜索结果写入新列

1 个答案: