我试图将每个字符组中的一些文本解析成碎片在我的情况下,字符组将是" *(("和"))&#34 ;
import re
file = "Name* ((Bla Bla Bla (Bla Bla) A40 & A41)) Name2* ((Bla Bla Bla (Bla Bla) A42 & A43)) Name3* ((Bla Bla Bla (Bla Bla) A44 & A45)) Name4* ((Bla Bla Bla (Bla Bla) A46 & A47)) Name5* ((Bla Bla Bla (Bla Bla) A48 & A49)) Name6* ((Bla Bla Bla (Bla Bla) A50 & A51)) Name7* ((Bla Bla Bla (Bla Bla) A452 & A53)) Name8* ((Bla Bla Bla (Bla Bla) A54 & A55)) Name9* ((Bla Bla Bla (Bla Bla) A56 & A57)) Name10* ((Bla Bla Bla (Bla Bla) A58 & A59)) Name11* ((Bla Bla Bla (Bla Bla) A60 & A61)) Name12* ((Bla Bla Bla (Bla Bla) A62 & A63)) Name13* ((Bla Bla Bla (Bla Bla) A64 & A65)) Name14* ((Bla Bla Bla (Bla Bla) A66 & A67)) Name14* ((Bla Bla Bla (Bla Bla) A68 & A69))"
parse = re.split('[* ((][)) ]', file)
print parse
我的结果又回来了:
['Name', '((Bla Bla Bla (Bla Bla) A40 & A41)) Name2', '((Bla Bla Bla (Bla Bla) A42 & A43)) Name3', '((Bla Bla Bla (Bla Bla) A44 & A45)) Name4', '((Bla Bla Bla (Bla Bla) A46 & A47)) Name5', '((Bla Bla Bla (Bla Bla) A48 & A49)) Name6', '((Bla Bla Bla (Bla Bla) A50 & A51)) Name7', '((Bla Bla Bla (Bla Bla) A452 & A53)) Name8', '((Bla Bla Bla (Bla Bla) A54 & A55)) Name9', '((Bla Bla Bla (Bla Bla) A56 & A57)) Name10', '((Bla Bla Bla (Bla Bla) A58 & A59)) Name11', '((Bla Bla Bla (Bla Bla) A60 & A61)) Name12', '((Bla Bla Bla (Bla Bla) A62 & A63)) Name13', '((Bla Bla Bla (Bla Bla) A64 & A65)) Name14', '((Bla Bla Bla (Bla Bla) A66 & A67)) Name14', '((Bla Bla Bla (Bla Bla) A68 & A69))']
它似乎只是将文本拆分为" *"。我似乎无法弄清楚如何设置多个多字符分隔符。有人有什么建议吗?感谢。
答案 0 :(得分:0)
我尝试使用正则表达式
import re
file = "your....string.... content" #your string goes here.
parse = re.split(r"\*|\)\)|\(\(", file)
输出:
['姓名',' ',' Bla Bla Bla(Bla Bla)A40& A41',' Name2',' ',' Bla Bla Bla(Bla Bla)A42& A43','名称3',' ',' Bla Bla Bla(Bla Bla)A44& A45',' Name4',' ',' Bla Bla Bla(Bla Bla)A46& A47',' Name5',' ',' Bla Bla Bla(Bla Bla)A48& A49','名称6',' ',' Bla Bla Bla(Bla Bla)A50& A51','名称7',' ',' Bla Bla Bla(Bla Bla)A452& A53','名称8',' ',' Bla Bla Bla(Bla Bla)A54& A55','名称9',' ',' Bla Bla Bla(Bla Bla)A56& A57','名称10',' ',' Bla Bla Bla(Bla Bla)A58& A59','名称11',' ',' Bla Bla Bla(Bla Bla)A60& A61','名称12',' ',' Bla Bla Bla(Bla Bla)A62& A63','名称13',' ',' Bla Bla Bla(Bla Bla)A64& A65','名称14',' ',' Bla Bla Bla(Bla Bla)A66& A67','名称14',' ',' Bla Bla Bla(Bla Bla)A68& A69','']
答案 1 :(得分:0)
我想分享我最终使用的解决方案,以防其他任何人受益。那里有正则表达式的混合物,但我使用findall而不是split。现在我已经走到这一步了,我不得不考虑更多地控制输出。数据被转储到3个字段(From_Node,To_Node,Link)。我需要第一个“To_Node”的值成为下一行“From_Node”的值,依此类推。想象一下沿着一条线,点A到B,然后点B到C,然后点C到D等....由于我的知识有限,我甚至不知道从哪里开始查找这个解决方案。有什么想法吗?
import re, arcpy
# Local variables:
Table1 = "D:\Database1.mdb\\Table1"
RAW_Data = "D:\Database1.mdb\RAW_Data"
#Create Cursors and Insert Rows
insertcursor = arcpy.da.InsertCursor(Table1, ["From_Node", "To_Node", "Link"])
with arcpy.da.SearchCursor(RAW_Data, ["Field1", "Field1", "Field1"]) as searchcursor:
try:
for row in searchcursor:
listFrom_Node = re.findall('\w+(?=\*\s*)', row[0]) #From Node
print listFrom_Node
print "From Node List Success"
listTo_Node = re.findall('\w+(?=\*\s*)', row[1]) #To Node
print listTo_Node
print "To Node List Success"
listLink = re.findall('\(\((.*?)\)\)', row[2]) #Link descriptions
print listLink
print "Link List Success"
for n,Value in enumerate(listFrom_Node):
insertcursor.insertRow((listFrom_Node[n], listTo_Node[n], listLink[n]))
except:
print ('Empty Cursor')
答案 2 :(得分:-1)
你可以对字符串使用拆分功能吗?这和一些列表理解能够完成这项工作。
In[31]: [i for s in [s.split(')) ') for s in file.split('* ((')] for i in s]
Out[31]:
['Name',
'Bla Bla Bla (Bla Bla) A40 & A41',
'Name2',
'Bla Bla Bla (Bla Bla) A42 & A43',
'Name3',
'Bla Bla Bla (Bla Bla) A44 & A45',
'Name4',
'Bla Bla Bla (Bla Bla) A46 & A47',
'Name5',
'Bla Bla Bla (Bla Bla) A48 & A49',
'Name6',
'Bla Bla Bla (Bla Bla) A50 & A51',
'Name7',
'Bla Bla Bla (Bla Bla) A452 & A53',
'Name8',
'Bla Bla Bla (Bla Bla) A54 & A55',
'Name9',
'Bla Bla Bla (Bla Bla) A56 & A57',
'Name10',
'Bla Bla Bla (Bla Bla) A58 & A59',
'Name11',
'Bla Bla Bla (Bla Bla) A60 & A61',
'Name12',
'Bla Bla Bla (Bla Bla) A62 & A63',
'Name13',
'Bla Bla Bla (Bla Bla) A64 & A65',
'Name14',
'Bla Bla Bla (Bla Bla) A66 & A67',
'Name14',
'Bla Bla Bla (Bla Bla) A68 & A69))']