具有用户定义维数

时间:2016-05-04 20:28:52

标签: arrays excel vba excel-vba

我最近写了一篇question求助于如何计算人口中每一对独特过敏症的发生次数。我得到的解决方案很棒,但我现在需要看看3+过敏的组合,并且使用Excel表格完成这一切将需要永远。

我决定编写一个VBA脚本来执行此操作,这对于成对很有用。它也更快,因为我回去并更改了源数据的格式,以便每个ExceptionID的相关AllergenID存储在一个以逗号分隔的字符串中。

我现在正在考虑升级到3D或更高的阵列,因为我们不知道我们可能需要多少维度(可能是10或15)我宁愿避免使用一系列{ {1}}或嵌套的Case语句。

我的研究出现了this article,其中我收集的是我所要求的实际上是不可能的,但我想询问OP的陈述

  

我认为如果我可以在运行时将Redim语句构造为字符串并执行字符串,那么这是可行的。但这似乎不可能。

我基本上有同样的想法。下面的代码会生成类型不匹配错误,但是没有可能有效的变体吗?我们不能在If/Then内传递其他功能(例如join)吗?

ReDim

This article听起来就像我在Java中所做的那样,但我不会说任何爪哇语,所以我无法确定这与我想要达到的目标有多相似,或者如果有办法将此方法应用于VBA ...

====== UPDATE ============
以下是我正在使用的数据示例(在单独的列中,为了清晰起见,我添加了破折号)

  

ExceptionID - ExcAllergens
  035 - 100380
  076 - 100107,100392,100345,100596,100141,100151,100344
  200 - 100123,100200
  325 - 100381
  354 - 100381,100123
  355 - 100381,100123
  360 - 100586
  390 - 100151,100344,100345,100349
  441 - 100380,100368
  448-100021,100181,100345,100200,100344,100295
  491 - 100381
  499 - 100333
  503 - 100333
  507 - 100331,100346,100596,100345,100344,100269,100283

以下是过敏原定义表的摘录(过敏原键是我刚刚添加的,以便使用较小的数字,6位数字是我们数据库中使用的数字。)

  

AllergenKey - AllergenID - AllergenTag
  01 - 100011 - AçaiBerry
  02 - 100012 - 乙酸
  03 - 100013 - 琼脂琼脂   04 - 100014 - 龙舌兰   05 - 100015 - 酒精
  06 - 100016 - Allspice
  07 - 100017 - 碳酸氢铵
  08 - 100018 - 淀粉酶
  09 - 100019 - 胭脂树   10 - 100020 - Apple
  11 - 100021 - Apple,Raw
  12 - 100022 - 杏子   13 - 100023 - 葛根
  14 - 100025 - 抗坏血酸
  15 - 100027 - 芦笋
  16 - 100028 - 鳄梨
  17 - 100029 - 细菌培养
  18 - 100030 - 发酵粉

请注意,有6810个例外情况,包括1到51个单独的过敏症(平均约4或5个),以及451种不同的过敏原。以下是我对过敏原对的分析结果(当我说“过敏原”它还包括像素食者这样的饮食偏好时):

  

前10对 - 对数 - 过敏原1 - 过敏原2
  1 - 245 - 乳制品 - 面筋
  2 - 232 - 鸡蛋 - 坚果
  3 - 190 - 乳制品 - 鸡蛋
  4 - 173 - 面筋 - 燕麦
  5 - 146 - 大豆(可能含有) - 大豆
  6 - 141 - 乳制品 - 坚果
  7 - 136 - 牛肉 - 猪肉
  8 - 120 - 乳制品 - 大豆
  9 - 114 - 芝麻(可能含有) - 坚果
  10 - 111 - 素食1 - 猪肉

2 个答案:

答案 0 :(得分:1)

我不会担心与您的中型数据集的最大可能组合。你无法做出所有可能的组合。您将拥有许多样本群体中不会出现的组合。不要尝试计算它们,然后计算出现次数。

相反,请完成示例填充,并在工作表“数组”上创建元组作为数据条目。我建议使用3位数的过敏原密钥作为标识符编号,并将元组中的数字组合成一个长整数(对于较大的数字可能需要十进制)。

我建议的方法是将元组组合为长的元素,以后可以很容易地分解。然后使用频率函数计算每个元组“数字”的出现次数。因此,如果有过敏原使用密钥:1,17,451 - 它们形成一个由1,017,451组成的长组(与451,17和1相同) - 我们确保任何元组都强制将最小键的顺序强制为最大键。所以最大三倍是449,450,451,最小值是1,002,003。请注意,您永远不会有3,002,001,因为它会复制1,002,003。

我玩的模块如下: 编辑 - 更好的代码

Option Explicit
Option Base 1

Public Function concID(paramArr() As Variant) As Variant
' this function takes an array of numbers and arranges the array into
' one long code number - with order of smallest to largest
' the code number generated has each individual array entry as a 3-digit component

  Dim wsf As WorksheetFunction
  Dim decExp As Integer
  Dim i As Long, j As Long
  Dim bigNum As Variant   ' may need to cast to Decimal??

  Set wsf = WorksheetFunction

  'may use cDec if necessary here??
  For i = 1 To UBound(paramArr)
        'determine the position of the component by multiplying by a multiple of 10^3
        decExp = 3 * (UBound(paramArr) - i)
        bigNum = bigNum + wsf.Small(paramArr, i) * 10 ^ decExp
  Next i
  concID = bigNum

End Function

Public Sub runAllergen()

  Dim ws As Worksheet
  Dim dataRange As Range, tupleRange As Range, uniqueList As Range, freqRange As Range, r As Range
  Dim i As Long, j As Long, counter As Long
  Dim dataArray As Variant, arr As Variant, tempholder As Long
  Dim bigArray(1 To 10 ^ 6, 1 To 1) As Variant ' the array which will hold all the generated combinations from the data
  Dim tuple As Long

  tuple = 3
  'this will come in as a user input.
  Set ws = Sheet1
  Set dataRange = ws.Range("A2:A10001")     'I have 10k people in my dataset, and this is just the allergen data vector

  Application.ScreenUpdating = False  'IMPORTANT for efficiency

  tempholder = 1 'this is the array index which the next combi entry is to be put into bigArray
  dataArray = dataRange.Value 'write entire worksheet column to internal array for efficiency
  For i = 1 To UBound(dataArray)
        'obtain array of allergen values in each data row to obtain tuples from
        arr = Split(dataArray(i, 1), ",")
        If UBound(arr) + 1 >= tuple Then
              'give over the array of row data to make tuples from and write to bigArray
              'return the next available index of bigArray to store data
              tempholder = printCombinations(arr, tuple, bigArray(), tempholder)
        End If
  Next i

  Set r = ws.Range("B2")
  'write entire list of tuples from data population to worksheet for efficiency - MASSIVE performance boost
  r.Resize(tempholder - 1, 1).Value = bigArray
  'copy tuple output over to another column to remove duplicates and get unique list
  Set tupleRange = ws.Range(r, r.End(xlDown))
  tupleRange.Copy
  Set r = ws.Range("D2")
  r.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
    :=False, Transpose:=False

  'remove duplicates from copied tuple output to get a unique list of codes to serve as bins in FREQUENCY function
  ws.Range(r, r.End(xlDown)).RemoveDuplicates Columns:=1, Header:=xlNo
  Set uniqueList = ws.Range(r, r.End(xlDown))
  Application.CutCopyMode = False
  'set the frquency output range which is always 1 more row than the bins array
  Set freqRange = uniqueList.Offset(0, 1).Resize(uniqueList.Rows.Count + 1, 1)
  'get the frequency of each tuple
  freqRange.FormulaArray = "=FREQUENCY(R2C" & tupleRange.Column & ":R" & tupleRange.Rows.Count + 1 & _
                    "C" & tupleRange.Column & _
                    ",R2C" & uniqueList.Column & ":R" & uniqueList.Rows.Count + 1 & "C" & uniqueList.Column & ")"

  Application.ScreenUpdating = True
End Sub

Public Function printCombinations(pool As Variant, r As Long, printVector As Variant, tempPosition As Long) As Long

  'this function writes the data row arrays as tuples/combis to the bigArray,
  'and returns the next available index in bigArray
  Dim i As Long, j As Long, n As Long
  Dim tempholder() As Variant
  Dim idx() As Long

  ReDim tempholder(1 To r)
  ReDim idx(1 To r)

  n = UBound(pool) - LBound(pool) + 1
  For i = 1 To r
        idx(i) = i
  Next i

  Do
        For j = 1 To r
              tempholder(j) = CLng(pool(idx(j) - 1))
        Next j

        'we now have an array of size tuple from the row data, so construct our code number,
        'and write to the next available index in bigArray

        printVector(tempPosition, 1) = concID(tempholder)
        tempPosition = tempPosition + 1

        ' Locate last non-max index
        i = r
        While (idx(i) = n - r + i)
              i = i - 1
              If i = 0 Then
                    'the algorithm has ended with the last index exhausted
                    'return the next available index of bigArray
                    printCombinations = tempPosition
                    Exit Function
              End If
        Wend

        idx(i) = idx(i) + 1
        For j = i + 1 To r
              idx(j) = idx(i) + j - i
        Next j
  Loop

End Function

初始设置:

enter image description here

您还可以将频率范围复制粘贴到值等等....

答案 1 :(得分:0)

为了扩展我的评论,这里有一些修改过的代码,用于根据提供的N_tuple变量使用数组数组。我很难想象一个不适合你的场景:

Sub testroutine()

Dim x As Integer, y As Integer 'just a counter
Dim ArrayTemp() As Variant
Dim PairCount() As Variant
Dim AllergenRef As Object 'Object to store a reference to each AllergenID using AllergenKey as key
    Set AllergenRef = CreateObject("Scripting.Dictionary")

For x = 1 To 20
    AllergenRef.Add x, (x * 10) + (2 ^ x) 'dummy data for my dictionary
Next x

Dim N_tuple As Integer
N_tuple = 5 'this value would be provided by a user form at runtime

'Now that you have your N_tuple, redim your paircount array
ReDim PairCount(1 To N_tuple)

'For each N_tuple, create an array and add it to the PairCount array
'Note that you could easily have a 2-dimensional array for a table of values as ArrayTemp
For x = 1 To N_tuple
    ReDim ArrayTemp(1 To AllergenRef.Count)
    PairCount(x) = ArrayTemp
Next x

'Now you have an array of arrays, which can be easily accessed.
'For example: PairCount(2)(3)
'Or if the subarrays are 2-dimensional: PairCount(4)(6, 12)

'This simply loops through the PairCount array and shows the ubound of its subarrays
For x = 1 To UBound(PairCount)
    MsgBox UBound(PairCount(x))
Next x

End Sub