从CSV文件计算R组合

时间:2015-11-09 10:21:29

标签: r combinatorics

我有一个CSV文件,其中包含大约400个值,范围从10 000到50 000.

我想计算所选值的组合,例如100,150,200,250对应于CSV文件中的值。

是否可以在R?

中进行

所以这是数据的一部分:

1359.214844
1604.558594
1701.759766
1761.083984
1792.990234
1926.248047
1958.144531
2086.373047
2114.501953
2142.542969
2204.325621
2216.468750
2229.136719
2286.894531
2302.847656
2379.826172
2395.039063
2467.578125
2610.802734
2797.929688
2812.916016
2838.947266
2979.498047
3122.171875
3163.671875
3457.794922
3809.228516
3826.058594
3952.609375
3983.210938
4102.996094

第二个数据集是(146.058, 203.193, 162.053, 291.095) 我需要第二个数据集的可能组合,它们对应于第一个中的值。例如291 * 2 + 162 * 5 + 203 * 4 = 2204。

1 个答案:

答案 0 :(得分:1)

还有其他方法可以做到这一点,比如在迭代i检查特定组合的循环并决定保留或忽略它,但我不希望在可能的情况下使用循环。

library(dplyr)

dt = read.table(text = "1359.214844
                1604.558594
                1701.759766
                1761.083984
                1792.990234
                1926.248047
                1958.144531
                2086.373047
                2114.501953
                2142.542969
                2204.325621
                2216.468750
                2229.136719
                2286.894531
                2302.847656
                2379.826172
                2395.039063
                2467.578125
                2610.802734
                2797.929688
                2812.916016
                2838.947266
                2979.498047
                3122.171875
                3163.671875
                3457.794922
                3809.228516
                3826.058594
                3952.609375
                3983.210938
                4102.996094")

# change column name and round values
names(dt) = "value"
dt$value = round(dt$value)

# give the manual values (assuming they are 4 values)
manual_values = c(146.058, 203.193, 162.053, 291.095)

# round values
manual_values = round(manual_values)


# get the maximum coefficient to investigate
coeff = ceiling(max(dt$value) / min(manual_values))


expand.grid(v1 = manual_values[1],  ## create all combinations of coefficients and keep your values
            v2 = manual_values[2],
            v3 = manual_values[3],
            v4 = manual_values[4],
            coeff1 = 0:coeff,
            coeff2 = 0:coeff,
            coeff3 = 0:coeff,
            coeff4 = 0:coeff) %>%
  mutate(value = v1*coeff1+v2*coeff2+v3*coeff3+v4*coeff4) %>%  ## calculate the value from each combination
  inner_join(dt, by="value")  ## join info from your initial values


## sample of the first 10 rows of the result :

#      v1  v2  v3  v4 coeff1 coeff2 coeff3 coeff4 value
# 1   146 203 162 291      3     10      0      0  2468
# 2   146 203 162 291      7     12      0      0  3458
# 3   146 203 162 291      9     13      0      0  3953
# 4   146 203 162 291      7      3      1      0  1793
# 5   146 203 162 291     22      3      1      0  3983
# 6   146 203 162 291     15      4      1      0  3164
# 7   146 203 162 291      4      5      1      0  1761
# 8   146 203 162 291      0     11      1      0  2395
# 9   146 203 162 291      4     11      1      0  2979
# 10  146 203 162 291      2     14      2      0  3458

因此,输出的第一行告诉您3 * 146 + 10 * 203的组合等于2468,这是初始数据集(CSV)中存在的值。

如果你发现任何错误,或者你需要任何澄清,请告诉我,我可以更新我的答案。

一项小改进可能是将inner_join替换为filter(value %in% dt$value)。当你可以通过使用过滤命令获得相同的输出时,我认为没有任何理由加入。

对于您的其他目标(在评论中指定),请尝试:

library(dplyr)

dt = read.table(text = "1359.214844
                1604.558594
                1701.759766
                1761.083984
                1792.990234
                1926.248047
                1958.144531
                2086.373047
                2114.501953
                2142.542969
                2204.325621
                2216.468750
                2229.136719
                2286.894531
                2302.847656
                2379.826172
                2395.039063
                2467.578125
                2610.802734
                2797.929688
                2812.916016
                2838.947266
                2979.498047
                3122.171875
                3163.671875
                3457.794922
                3809.228516
                3826.058594
                3952.609375
                3983.210938
                4102.996094")

# change column name and round values
names(dt) = "value"
dt$value = round(dt$value)

# give the manual values (assuming they are 4 values)
manual_values = c(146.058, 203.193, 162.053, 291.095)

# get the maximum coefficient to investigate
coeff = ceiling(max(dt$value) / min(manual_values))


expand.grid(v1 = manual_values[1],  ## create all combinations of coefficients and keep your values
            v2 = manual_values[2],
            v3 = manual_values[3],
            v4 = manual_values[4],
            coeff1 = 0:3,
            coeff2 = 5:coeff,
            coeff3 = 5:coeff,
            coeff4 = 0:3) %>%
  mutate(SUM = v1*coeff1+v2*coeff2+v3*coeff3+v4*coeff4) %>%  ## calculate the value of each combination
  tbl_df()                          ## only for printing top 10 rows


#         v1      v2      v3      v4 coeff1 coeff2 coeff3 coeff4      SUM
#      (dbl)   (dbl)   (dbl)   (dbl)  (int)  (int)  (int)  (int)    (dbl)
# 1  146.058 203.193 162.053 291.095      0      5      5      0 1826.230
# 2  146.058 203.193 162.053 291.095      1      5      5      0 1972.288
# 3  146.058 203.193 162.053 291.095      2      5      5      0 2118.346
# 4  146.058 203.193 162.053 291.095      3      5      5      0 2264.404
# 5  146.058 203.193 162.053 291.095      0      6      5      0 2029.423
# 6  146.058 203.193 162.053 291.095      1      6      5      0 2175.481
# 7  146.058 203.193 162.053 291.095      2      6      5      0 2321.539
# 8  146.058 203.193 162.053 291.095      3      6      5      0 2467.597
# 9  146.058 203.193 162.053 291.095      0      7      5      0 2232.616
# 10 146.058 203.193 162.053 291.095      1      7      5      0 2378.674
# ..     ...     ...     ...     ...    ...    ...    ...    ...      ...

您可以将此结果表保存为数据框,并根据需要继续处理。