计算Excel字符串中单词的频率

时间:2014-02-18 15:56:38

标签: excel excel-vba vba

假设我有一个任意长度的列,其中每个单元格包含一串文本。有没有办法确定列中最常出现的单词(事先不知道要检查哪些单词),然后在两列表中将这些单词及其频率排序? VBA最适合这项任务吗?

例如,单元格可能包含字符串“This is a string,此字符串中的字符数为> 0。” (故意错误)

4 个答案:

答案 0 :(得分:11)

选择 A 列的一部分并运行此小宏(该表将放置在cols中。 B &amp; C :< / p>

Sub Ftable()
    Dim BigString As String, I As Long, J As Long, K As Long
    BigString = ""

'添加代码以将“全部”和“全部”相加 '添加代码以分开“。” “!”来自他们之前的单词等等 '也计入总数。例如:“全部”。不应该报告为1“全部”。但是“全部”被添加到“所有”单词的总数中。 “你会发布这个新代码吗?

    For Each r In Selection 
          BigString = BigString & " " & r.Value
    Next r
    BigString = Trim(BigString)
    ary = Split(BigString, " ")
    Dim cl As Collection
    Set cl = New Collection
    For Each a In ary
        On Error Resume Next
        cl.Add a, CStr(a)
    Next a

    For I = 1 To cl.Count
        v = cl(I)
        Cells(I, "B").Value = v
        J = 0
        For Each a In ary
            If a = v Then J = J + 1
        Next a
        Cells(I, "C") = J
    Next I

End Sub

答案 1 :(得分:4)

鉴于此:

enter image description here

我将使用数据透视表来获取此信息:

enter image description here

最重要的是,如果我得到更多,很容易获得前5名,10名等等。而且它总是会产生独特的指数。从那里,你可以做所有的编辑和计算方式。 :)

答案 2 :(得分:2)

使用Google表格:

index((Transpose(ArrayFormula(QUERY(TRANSPOSE(SPLIT(JOIN(" ",$B$2)," ")&{"";""}),"select Col1, count(Col2) group by Col1 order by count(Col2) desc limit 20 label Col1 'Word', count(Col2) 'Frequency'",0)))),1,$A6+1)&":"&index((Transpose(ArrayFormula(QUERY(TRANSPOSE(SPLIT(JOIN(" ",$B$2)," ")&{"";""}),"select Col1, count(Col2) group by Col1 order by count(Col2) desc limit 20 label Col1 'Word', count(Col2) 'Frequency'",0)))),2,$A6+1)

在上面的$ B $ 2中包含文本字符串

$ A6 = 1将为您提供最常用的单词

$ A6 = 2会给你第二个最常用的单词 等

这最常做20次。如果您想要更多,请将限制值增加到您想要的任何值。

答案 3 :(得分:1)

这是一个微小的修复,加上“加里的学生”提供的剧本的增强。修复是,虽然构建集合显然不区分大小写(这是正确的 - 我们可能不希望添加到集合中的新项目仅在现有项目的情况下有所不同),IF计数执行计数是写的区分大小写,所以它不正确。只需将该行更改为...

If LCase(a) = LCase(v) Then J = J + 1

这是我的增强功能。要使用它,首先要选择一个或多个列,而不是它们的(第一个)标题/标签行。然后运行脚本,它会为新工作表中的每个选定列提供结果 - 以及该标题/标签行,以便您知道您正在查看的内容。

我只是个讨厌的人。当我需要完成工作时,我只是破解东西,所以它不优雅,我敢肯定......

Sub FrequencyV2() 'Modified from: https://stackoverflow.com/questions/21858874/counting-the-frequencies-of-words-in-excel-strings
'It determines the frequency of words found in each selected column.
'Puts results in new worksheets.
'Before running, select one or more columns but not the header rows.
    Dim rng As Range
    Dim row As Range
    Dim col As Range
    Dim cell As Range
    Dim ws As Worksheet
    Dim wsNumber As Long 'Used to put a number in the names of the newly created worksheets
    wsNumber = 1
    Set rng = Selection
    For Each col In rng.Columns
        Dim BigString As String, I As Long, J As Long, K As Long
        BigString = ""
        For Each cell In col.Cells
            BigString = BigString & " " & cell.Value
        Next cell
        BigString = Trim(BigString)
        ary = Split(BigString, " ")
        Dim cl As Collection
        Set cl = New Collection
        For Each a In ary
            On Error Resume Next 'This works because an error occurs if item already exists in the collection.
            'Note that it's not case sensitive.  Differently capitalized items will be identified as already belonging to collection.
            cl.Add a, CStr(a)
        Next a
        Set ws = Sheets.Add(After:=Sheets(Sheets.Count))
        ws.Name = "F" & CStr(wsNumber)
        wsNumber = wsNumber + 1
        Worksheets(ws.Name).Cells(1, "A").Value = col.Cells(1, 1).Offset(-1, 0).Value 'Copies the table header text for current column to new worksheet.
        For I = 1 To cl.Count
            v = cl(I)
            Worksheets(ws.Name).Cells(I + 1, "A").Value = v 'The +1 needed because header text takes up row 1.
            J = 0
            For Each a In ary
                If LCase(a) = LCase(v) Then J = J + 1
            Next a
            Worksheets(ws.Name).Cells(I + 1, "B") = J 'The +1 needed because header text takes up row 1.
        Next I
    Next col
End Sub