按行data.table排序与参考行的百分比差异

时间:2017-06-27 17:45:34

标签: r data.table

我有一些如下所示的数据:

Seller              Name                        Price
ⒽomeⓄnline         Harper Hand Truck and Dolly 51.7
HomeOnline          Harper Hand Truck and Dolly 62.54
Amazon.com          Harper Hand Truck and Dolly 41.83
XpW                 Honeywell Safe Chest        41.37
XoXoGroupLLC        Honeywell Safe Chest        51.78
Toys Online         Honeywell Safe Chest        43.01
Tempus & Co.        Honeywell Safe Chest        52.7
stores123           Honeywell Safe Chest        51.21
ⒽomeⓄnline         Honeywell Safe Chest        43.88
HomeOnline          Honeywell Safe Chest        43.87
Great Brands Outlet Honeywell Safe Chest        64.95
Connect Buy         Honeywell Safe Chest        30.1
Amazon.com          Honeywell Safe Chest        24.6

我想通过Name计算每行与Amazon.com成为卖家的行之间的百分比差异。因此输出看起来像'etc ...'意味着行一直填充:

    Seller              Name                        Price     Pct_Diff
    ⒽomeⓄnline         Harper Hand Truck and Dolly 51.7       .23
    HomeOnline          Harper Hand Truck and Dolly 62.54      .49
    Amazon.com          Harper Hand Truck and Dolly 41.83
    XpW                 Honeywell Safe Chest        41.37      .68    
    XoXoGroupLLC        Honeywell Safe Chest        51.78      1.0
    Toys Online         Honeywell Safe Chest        43.01      etc...
    Tempus & Co.        Honeywell Safe Chest        52.7
    stores123           Honeywell Safe Chest        51.21
    ⒽomeⓄnline         Honeywell Safe Chest        43.88
    HomeOnline          Honeywell Safe Chest        43.87
    Great Brands Outlet Honeywell Safe Chest        64.95
    Connect Buy         Honeywell Safe Chest        30.1
    Amazon.com          Honeywell Safe Chest        24.6

我认为这是一个很好的data.table解决方案。但我无法弄清楚如何将没有“Amazon.com”作为卖家的每一行与将“Amazon.com”作为卖家的行进行比较。

2 个答案:

答案 0 :(得分:2)

您可以使用:

dt[, pct := (Price - Price[Seller=='Amazon.com'])/Price[Seller=='Amazon.com'], by = Name]

给出:

                 Seller                        Name Price       pct
 1:         ⒽomeⓄnline Harper Hand Truck and Dolly 51.70 0.2359551
 2:          HomeOnline Harper Hand Truck and Dolly 62.54 0.4950992
 3:          Amazon.com Harper Hand Truck and Dolly 41.83 0.0000000
 4:                 XpW        Honeywell Safe Chest 41.37 0.6817073
 5:        XoXoGroupLLC        Honeywell Safe Chest 51.78 1.1048780
 6:         Toys Online        Honeywell Safe Chest 43.01 0.7483740
 7:        Tempus & Co.        Honeywell Safe Chest 52.70 1.1422764
 8:           stores123        Honeywell Safe Chest 51.21 1.0817073
 9:         ⒽomeⓄnline        Honeywell Safe Chest 43.88 0.7837398
10:          HomeOnline        Honeywell Safe Chest 43.87 0.7833333
11: Great Brands Outlet        Honeywell Safe Chest 64.95 1.6402439
12:         Connect Buy        Honeywell Safe Chest 30.10 0.2235772
13:          Amazon.com        Honeywell Safe Chest 24.60 0.0000000

dplyr中实现的逻辑相同:

dt %>% 
  group_by(Name) %>% 
  mutate(pct = (Price - Price[Seller=='Amazon.com'])/Price[Seller=='Amazon.com'])

使用过的数据:

dt <- structure(list(Seller = c("ⒽomeⓄnline", "HomeOnline", "Amazon.com", "XpW", "XoXoGroupLLC", "Toys Online", "Tempus & Co.", "stores123", "ⒽomeⓄnline", "HomeOnline", "Great Brands Outlet", "Connect Buy", "Amazon.com"), 
                     Name = c("Harper Hand Truck and Dolly", "Harper Hand Truck and Dolly", "Harper Hand Truck and Dolly", "Honeywell Safe Chest", "Honeywell Safe Chest", "Honeywell Safe Chest", "Honeywell Safe Chest", "Honeywell Safe Chest", "Honeywell Safe Chest", "Honeywell Safe Chest", "Honeywell Safe Chest", "Honeywell Safe Chest", "Honeywell Safe Chest"),
                     Price = c(51.7, 62.54, 41.83, 41.37, 51.78, 43.01, 52.7, 51.21, 43.88, 43.87, 64.95, 30.1, 24.6)),
                .Names = c("Seller", "Name", "Price"), class = c("data.table", "data.frame"), row.names = c(NA, -13L))

答案 1 :(得分:1)

这是一个dplyr解决方案

libary(dplyr)

df <- data.frame(
  Seller = c("ⒽomeⓄnline", "HomeOnline", "Amazon.com", "XpW", "XoXoGroupLLC", "Toys Online", "Tempus & Co.", "stores123", "ⒽomeⓄnline", "HomeOnline", "Great Brands Outlet", "Connect Buy", "Amazon.com"),
  Name = c("Harper Hand Truck and Dolly","Harper Hand Truck and Dolly","Harper Hand Truck and Dolly","Honeywell Safe Chest", "Honeywell Safe Chest", "Honeywell Safe Chest", "Honeywell Safe Chest", "Honeywell Safe Chest", "Honeywell Safe Chest", "Honeywell Safe Chest", "Honeywell Safe Chest", "Honeywell Safe Chest", "Honeywell Safe Chest"),
  Price = c(51.7, 62.54, 41.83, 41.37, 51.78, 43.01, 52.7, 51.21, 43.88, 43.87, 64.95, 30.1, 24.6)
)

df %>% 
  # Join each row with the "Amazon.com" price for this item
  left_join(df %>% filter(Seller == "Amazon.com"), by = "Name", suffix = c("", ".amazon")) %>%
  # Remove unused "Seller" column
  select(-Seller.amazon) %>%
  # Calculate percentage for each row, except for
  # "Amazon.com" rows, for which the percent difference is NA
  mutate(Pct_Diff = ifelse(Seller == "Amazon.com", NA, round((Price - Price.amazon) / Price.amazon, 2)))

#                      Seller                        Name Price Price.amazon Pct_Diff
# 1  <U+24BD>ome<U+24C4>nline Harper Hand Truck and Dolly 51.70        41.83     0.24
# 2                HomeOnline Harper Hand Truck and Dolly 62.54        41.83     0.50
# 3                Amazon.com Harper Hand Truck and Dolly 41.83        41.83       NA
# 4                       XpW        Honeywell Safe Chest 41.37        24.60     0.68
# 5              XoXoGroupLLC        Honeywell Safe Chest 51.78        24.60     1.10
# 6               Toys Online        Honeywell Safe Chest 43.01        24.60     0.75
# 7              Tempus & Co.        Honeywell Safe Chest 52.70        24.60     1.14
# 8                 stores123        Honeywell Safe Chest 51.21        24.60     1.08
# 9  <U+24BD>ome<U+24C4>nline        Honeywell Safe Chest 43.88        24.60     0.78
# 10               HomeOnline        Honeywell Safe Chest 43.87        24.60     0.78
# 11      Great Brands Outlet        Honeywell Safe Chest 64.95        24.60     1.64
# 12              Connect Buy        Honeywell Safe Chest 30.10        24.60     0.22
# 13               Amazon.com        Honeywell Safe Chest 24.60        24.60       NA