Question

我正在尝试从表中的字符串中找出一些公司的名称。

每个字符串都以2个单词（需要删除）开头，并以不同数量的单词/字符结尾。

为了删除前两个字，我尝试了以下公式：

=MID(A6,1+FIND("~",SUBSTITUTE(A6," ","~",2)),255)

但是我不明白如何删除最后一个单词/字符。

这里是示例：

A栏中是原始文本。 B列是结果（外观）。

也许有一种使用VBA解决它的更简单/更好的方法吗？

@Ron Rosenfeld：

Answer 1

我不确定右侧截止的标准是什么，所以这里有两种方法。

这是假设始终存在国家代码（XX）：

import pandas as pd

# list to save all dataframe from all tables in all files
df_list = list()

# list of files to load
list_of_files = ['test.html']

# iterate through your files
for file in list_of_files:
    
    # create a list of dataframes from the tables in the file
    dfl = pd.read_html(file, match='Game Name')
    
    # fix the headers and columns
    for d in dfl:

        # select row 1 as the headers
        d.columns = d.iloc[1]

        # select row 0, column 0 as the platform
        d['platform'] = d.iloc[0, 0]

        # selection row 2 and below as the data, row 0 and 1 were the headers
        d = d.iloc[2:]

        # append the cleaned dataframe to df_list
        df_list.append(d.copy())
        
# create a single dataframe
df = pd.concat(df_list).reset_index(drop=True)

# create a list of dicts from df
records = df.to_dict('records')

print(records)
[out]:
[{'Game Name': 'GoW', 'Price': '49.99', 'platform': 'PS4'},
 {'Game Name': 'FF VII R', 'Price': '59.99', 'platform': 'PS4'},
 {'Game Name': 'Gears 5', 'Price': '49.99', 'platform': 'XBX'},
 {'Game Name': 'Forza 5', 'Price': '59.99', 'platform': 'XBX'}]

下一个基本上是相同的，只是它假定您要删除数组的最后两个元素。

    Dim i As Long
    Dim lr As Long
    Dim strarr() As String
    Dim j As Long
    With ThisWorkbook.Sheets("Sheet1") 'Change this to your sheet name
        lr = .Cells(.Rows.Count, 1).End(xlUp).Row
        For i = 1 To lr
            strarr = Split(.Cells(i, 1).Value, " ")
            For j = 2 To UBound(strarr)
                If strarr(j) Like "(??)" Then
                    Exit For
                Else
                    .Cells(i, 2).Value = .Cells(i, 2).Value & " " & strarr(j)
                End If
            Next j
        Next i
    End With

Answer 2

假设您的截断是第一次出现，您具有明显的国家代码，例如（CN），并且每个项目都具有该国家代码格式，那么此公式将起作用。它将检查第一个打开和关闭括号之间的长度。如果length = 3，则它在圆括号前切掉1个字符，否则在第一个圆括号后切掉。

    =IF(FIND(")",MID(A2,1+FIND("~",SUBSTITUTE(A2," ","~",2)),255)) - FIND("(",MID(A2,1+FIND("~",SUBSTITUTE(A2," ","~",2)),255)) = 3,LEFT(MID(A2,1+FIND("~",SUBSTITUTE(A2," ","~",2)),255),FIND("(",MID(A2,1+FIND("~",SUBSTITUTE(A2," ","~",2)),255))-1),LEFT(MID(A2,1+FIND("~",SUBSTITUTE(A2," ","~",2)),255),FIND(")",MID(A2,1+FIND("~",SUBSTITUTE(A2," ","~",2)),255))))

结果：

Answer 3

您似乎想要

删除前两个单词
删除最后两个子字符串，其中子字符串定义为用括号括起来。

如果是这种情况，请尝试：

=TRIM(LEFT(MID(A1,FIND(CHAR(1),SUBSTITUTE(A1," ",CHAR(1),2))+1,999),FIND(CHAR(1),SUBSTITUTE(MID(A1,FIND(CHAR(1),SUBSTITUTE(A1," ",CHAR(1),2))+1,999),"(",CHAR(1),LEN(MID(A1,FIND(CHAR(1),SUBSTITUTE(A1," ",CHAR(1),2))+1,999))-LEN(SUBSTITUTE(MID(A1,FIND(CHAR(1),SUBSTITUTE(A1," ",CHAR(1),2))+1,999),"(",""))-1))-1))

如果您希望UDF实现相同的算法，则建议使用正则表达式：

Option Explicit
Function extrCompany(S As String) As String
    Dim RE As Object
    Const sPat As String = "^(?:\w+\s+){2}(.*?)\s*\([^)]+\)\s*\([^)]+\)[^)(]*$"
    
Set RE = CreateObject("vbscript.regexp")
With RE
    .Global = True
    .MultiLine = True
    .Pattern = sPat
    extrCompany = .Replace(S, "$1")
End With

End Function

Excel：删除开头和结尾的单词

3 个答案: