Excel:删除开头和结尾的单词

时间:2020-09-29 19:19:36

标签: excel vba excel-formula

我正在尝试从表中的字符串中找出一些公司的名称。

每个字符串都以2个单词(需要删除)开头,并以不同数量的单词/字符结尾。

为了删除前两个字,我尝试了以下公式:

=MID(A6,1+FIND("~",SUBSTITUTE(A6," ","~",2)),255)

但是我不明白如何删除最后一个单词/字符。

这里是示例:

enter image description here

A栏中是原始文本。 B列是结果(外观)。

也许有一种使用VBA解决它的更简单/更好的方法吗?

@Ron Rosenfeld:

enter image description here

3 个答案:

答案 0 :(得分:1)

我不确定右侧截止的标准是什么,所以这里有两种方法。

这是假设始终存在国家代码(XX):

import pandas as pd

# list to save all dataframe from all tables in all files
df_list = list()

# list of files to load
list_of_files = ['test.html']

# iterate through your files
for file in list_of_files:
    
    # create a list of dataframes from the tables in the file
    dfl = pd.read_html(file, match='Game Name')
    
    # fix the headers and columns
    for d in dfl:

        # select row 1 as the headers
        d.columns = d.iloc[1]

        # select row 0, column 0 as the platform
        d['platform'] = d.iloc[0, 0]

        # selection row 2 and below as the data, row 0 and 1 were the headers
        d = d.iloc[2:]

        # append the cleaned dataframe to df_list
        df_list.append(d.copy())
        
# create a single dataframe
df = pd.concat(df_list).reset_index(drop=True)

# create a list of dicts from df
records = df.to_dict('records')

print(records)
[out]:
[{'Game Name': 'GoW', 'Price': '49.99', 'platform': 'PS4'},
 {'Game Name': 'FF VII R', 'Price': '59.99', 'platform': 'PS4'},
 {'Game Name': 'Gears 5', 'Price': '49.99', 'platform': 'XBX'},
 {'Game Name': 'Forza 5', 'Price': '59.99', 'platform': 'XBX'}]

下一个基本上是相同的,只是它假定您要删除数组的最后两个元素。

    Dim i As Long
    Dim lr As Long
    Dim strarr() As String
    Dim j As Long
    With ThisWorkbook.Sheets("Sheet1") 'Change this to your sheet name
        lr = .Cells(.Rows.Count, 1).End(xlUp).Row
        For i = 1 To lr
            strarr = Split(.Cells(i, 1).Value, " ")
            For j = 2 To UBound(strarr)
                If strarr(j) Like "(??)" Then
                    Exit For
                Else
                    .Cells(i, 2).Value = .Cells(i, 2).Value & " " & strarr(j)
                End If
            Next j
        Next i
    End With

答案 1 :(得分:1)

假设您的截断是第一次出现,您具有明显的国家代码,例如(CN),并且每个项目都具有该国家代码格式,那么此公式将起作用。它将检查第一个打开和关闭括号之间的长度。如果length = 3,则它在圆括号前切掉1个字符,否则在第一个圆括号后切掉。

    =IF(FIND(")",MID(A2,1+FIND("~",SUBSTITUTE(A2," ","~",2)),255)) - FIND("(",MID(A2,1+FIND("~",SUBSTITUTE(A2," ","~",2)),255)) = 3,LEFT(MID(A2,1+FIND("~",SUBSTITUTE(A2," ","~",2)),255),FIND("(",MID(A2,1+FIND("~",SUBSTITUTE(A2," ","~",2)),255))-1),LEFT(MID(A2,1+FIND("~",SUBSTITUTE(A2," ","~",2)),255),FIND(")",MID(A2,1+FIND("~",SUBSTITUTE(A2," ","~",2)),255))))

结果:

enter image description here

答案 2 :(得分:1)

您似乎想要

  • 删除前两个单词
  • 删除最后两个子字符串,其中子字符串定义为用括号括起来。

如果是这种情况,请尝试:

=TRIM(LEFT(MID(A1,FIND(CHAR(1),SUBSTITUTE(A1," ",CHAR(1),2))+1,999),FIND(CHAR(1),SUBSTITUTE(MID(A1,FIND(CHAR(1),SUBSTITUTE(A1," ",CHAR(1),2))+1,999),"(",CHAR(1),LEN(MID(A1,FIND(CHAR(1),SUBSTITUTE(A1," ",CHAR(1),2))+1,999))-LEN(SUBSTITUTE(MID(A1,FIND(CHAR(1),SUBSTITUTE(A1," ",CHAR(1),2))+1,999),"(",""))-1))-1))

如果您希望UDF实现相同的算法,则建议使用正则表达式:

Option Explicit
Function extrCompany(S As String) As String
    Dim RE As Object
    Const sPat As String = "^(?:\w+\s+){2}(.*?)\s*\([^)]+\)\s*\([^)]+\)[^)(]*$"
    
Set RE = CreateObject("vbscript.regexp")
With RE
    .Global = True
    .MultiLine = True
    .Pattern = sPat
    extrCompany = .Replace(S, "$1")
End With

End Function