我正在尝试从表中的字符串中找出一些公司的名称。
每个字符串都以2个单词(需要删除)开头,并以不同数量的单词/字符结尾。
为了删除前两个字,我尝试了以下公式:
=MID(A6,1+FIND("~",SUBSTITUTE(A6," ","~",2)),255)
但是我不明白如何删除最后一个单词/字符。
这里是示例:
A栏中是原始文本。 B列是结果(外观)。
也许有一种使用VBA解决它的更简单/更好的方法吗?
@Ron Rosenfeld:
答案 0 :(得分:1)
我不确定右侧截止的标准是什么,所以这里有两种方法。
这是假设始终存在国家代码(XX):
import pandas as pd
# list to save all dataframe from all tables in all files
df_list = list()
# list of files to load
list_of_files = ['test.html']
# iterate through your files
for file in list_of_files:
# create a list of dataframes from the tables in the file
dfl = pd.read_html(file, match='Game Name')
# fix the headers and columns
for d in dfl:
# select row 1 as the headers
d.columns = d.iloc[1]
# select row 0, column 0 as the platform
d['platform'] = d.iloc[0, 0]
# selection row 2 and below as the data, row 0 and 1 were the headers
d = d.iloc[2:]
# append the cleaned dataframe to df_list
df_list.append(d.copy())
# create a single dataframe
df = pd.concat(df_list).reset_index(drop=True)
# create a list of dicts from df
records = df.to_dict('records')
print(records)
[out]:
[{'Game Name': 'GoW', 'Price': '49.99', 'platform': 'PS4'},
{'Game Name': 'FF VII R', 'Price': '59.99', 'platform': 'PS4'},
{'Game Name': 'Gears 5', 'Price': '49.99', 'platform': 'XBX'},
{'Game Name': 'Forza 5', 'Price': '59.99', 'platform': 'XBX'}]
下一个基本上是相同的,只是它假定您要删除数组的最后两个元素。
Dim i As Long
Dim lr As Long
Dim strarr() As String
Dim j As Long
With ThisWorkbook.Sheets("Sheet1") 'Change this to your sheet name
lr = .Cells(.Rows.Count, 1).End(xlUp).Row
For i = 1 To lr
strarr = Split(.Cells(i, 1).Value, " ")
For j = 2 To UBound(strarr)
If strarr(j) Like "(??)" Then
Exit For
Else
.Cells(i, 2).Value = .Cells(i, 2).Value & " " & strarr(j)
End If
Next j
Next i
End With
答案 1 :(得分:1)
假设您的截断是第一次出现,您具有明显的国家代码,例如(CN),并且每个项目都具有该国家代码格式,那么此公式将起作用。它将检查第一个打开和关闭括号之间的长度。如果length = 3,则它在圆括号前切掉1个字符,否则在第一个圆括号后切掉。
=IF(FIND(")",MID(A2,1+FIND("~",SUBSTITUTE(A2," ","~",2)),255)) - FIND("(",MID(A2,1+FIND("~",SUBSTITUTE(A2," ","~",2)),255)) = 3,LEFT(MID(A2,1+FIND("~",SUBSTITUTE(A2," ","~",2)),255),FIND("(",MID(A2,1+FIND("~",SUBSTITUTE(A2," ","~",2)),255))-1),LEFT(MID(A2,1+FIND("~",SUBSTITUTE(A2," ","~",2)),255),FIND(")",MID(A2,1+FIND("~",SUBSTITUTE(A2," ","~",2)),255))))
结果:
答案 2 :(得分:1)
您似乎想要
如果是这种情况,请尝试:
=TRIM(LEFT(MID(A1,FIND(CHAR(1),SUBSTITUTE(A1," ",CHAR(1),2))+1,999),FIND(CHAR(1),SUBSTITUTE(MID(A1,FIND(CHAR(1),SUBSTITUTE(A1," ",CHAR(1),2))+1,999),"(",CHAR(1),LEN(MID(A1,FIND(CHAR(1),SUBSTITUTE(A1," ",CHAR(1),2))+1,999))-LEN(SUBSTITUTE(MID(A1,FIND(CHAR(1),SUBSTITUTE(A1," ",CHAR(1),2))+1,999),"(",""))-1))-1))
如果您希望UDF实现相同的算法,则建议使用正则表达式:
Option Explicit
Function extrCompany(S As String) As String
Dim RE As Object
Const sPat As String = "^(?:\w+\s+){2}(.*?)\s*\([^)]+\)\s*\([^)]+\)[^)(]*$"
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.MultiLine = True
.Pattern = sPat
extrCompany = .Replace(S, "$1")
End With
End Function