我希望从长的非结构化文本中解析出一段特定的文本。我要捕获的部分总是在其左侧和右侧都有一个带有整数的“x”。
这是我的公式:
SimpleDateFormat hourFormat = new SimpleDateFormat("dd/mm/yyyy hh:mm:ss");
和另一个版本我尝试用OR语句处理空格(不起作用)
=IFERROR(SUBSTITUTE(RIGHT(LEFT(G2,FIND("x",G2)-1),FIND("_",G2)-3)&MID(G2,FIND("x",G2),FIND("_",G2)-2),"_",""),"1x1")
原始文本 - 我公式的结果 - 所需结果
=IFERROR(SUBSTITUTE(RIGHT(LEFT(G4,FIND("x",G4)-1),FIND(OR("_"," "),G4)-3)&MID(G4,FIND("x",G4),FIND("_",G4)-2),"_",""),"1x1")
理想情况下,如果我可以让MID公式Q1-Q4_Year_Source_Type_P_LongName_300x250_Target_Server 300x250 300x250
Q1-Q4_Year_Client_Client Year_Type_P_LongName_1600x1000_Site_Server 600x100 1600x1000
02.04 Search Sponsorship - 728x90 1x1 728x90
Some Website_300x600 ROS Display ebsite300x600 ROS Di 300x600
从右到左而不是从左到右阅读,我想我会处于良好的状态。
感谢。
答案 0 :(得分:0)
有一个从字符串中提取数字的标准公式(在许多变体中找到): -
=LOOKUP(99^99,--("0"&MID(A1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")),ROW($1:$15))))
所以你可以使用它从“x”左边的几个字符开始,然后从“x”本身开始。建议使用辅助单元来避免长公式,所以如果原始字符串在A1中: -
=mid(A1,find("x",A1)-5,999) in B1
=mid(A1,find("x",A1),999) in C1
然后是D1中的第一个数字
=LOOKUP(99^99,--("0"&MID(B1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},B1&"0123456789")),ROW($1:$15))))
和E1中的第二个数字
=LOOKUP(99^99,--("0"&MID(C1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},C1&"0123456789")),ROW($1:$15))))
将它们连接在一起: -
=D1&"x"&E1
答案 1 :(得分:0)
这是一个复杂的公式
space
或underscore
nnnxnnn
的模式匹配的单词:
x
(如果x
可能是这种情况,请在下面的公式中将FIND
替换为SEARCH
公式包括几个"子公式"
我们将字符串space
和underscore
分成一个单词数组:
=TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99))
上面的SEQ
是一个命名公式:(Formulas ► Define Name
)
=IF((ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$65536,255,1))-1)*99=0,1,(ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$65536,255,1))-1)*99)
该公式生成一系列数字1,99,198,297, ...
,为第一个公式中的MID
函数提供了良好的起点。
然后我们使用LEFT和MID函数来查找包含x
的单词,并在x
之前和之后有数字
ISNUMBER(-LEFT(TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99)),FIND("x",TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99)))-1))
ISNUMBER(-MID(TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99)),FIND("x",TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99)))+1,99)))
将这两个公式相乘将返回0和#1的数组,用于与模式匹配或不匹配的单词。
1/(...)
然后将返回1
或DIV/0
个错误的数组。
使用LOOKUP
的矢量形式将返回我们的字符串数组中与我们的模式匹配数组中的匹配位置相同的值。
=LOOKUP(2,1/(ISNUMBER(-LEFT(TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99)),FIND("x",TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99)))-1))*ISNUMBER(-MID(TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99)),FIND("x",TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99)))+1,99))),TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"_"," ")," ",REPT(" ",99)),SEQ,99)))
我会注意到使用VBA和正则表达式,相同的模式可以表示为\d+x\d+
并且可以使用用户定义的函数来完成同样的事情,一旦你流利就可以花一点时间来设计;
Option Explicit
Function ExtractMeasure(S As String) As String
Dim RE As Object, MC As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Pattern = "\d+x\d+"
.Global = False
.ignorecase = False 'Case Sensitive
If .test(S) = True Then
Set MC = .Execute(S)
ExtractMeasure = MC(0)
End If
End With
End Function
答案 2 :(得分:-1)
在这种情况下,最好在EXCEL中使用正则表达式。请在EXCEL中使用正则表达式参考以下帖子。但是,你必须使用VBA。 How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops