我正在使用python 2.7版本。
我正在尝试使用python提取数组列名称。
数组列如下:
`col` array<struct< columnname:string,columnname1:int,columnname2:decimal(10,0),
columnname3:decimal(9,2)>>
到目前为止我尝试过的事情:
import re
str=input("enter any string:")
fields=str.split(",")
for x in fields:
name=x.split(":")
seminame=name[0]+','
firstname=seminame.find('`')
lastname=seminame.rfind('`')
fullname=seminame[(firstname+1):lastname]
replacename1=fullname.replace(')', '')
replacename2=fullname.replace('2', '')
replacename3=fullname.replace('9', '')
replacename4=fullname.replace('10', '')
replacename5=fullname.replace('0', '')
finalname='.'+replacename5
print(finalname)
Input:
'`col` array<struct< columnname:string,columnname1:int,columnname2:decimal(10,0),
columnname3:decimal(9,2)>>'
我希望输出为
Actual output
.col,
.columnname1,
.columnname2,
.),
Expected output
col.columnname,
col.columnname1,
col.columnname2,
col.columnname3
答案 0 :(得分:1)
为什么不使用re做同样的事情?
import re
str = "'`col` array<struct< columnname:string,columnname1:int,columnname2:decimal(10,0),columnname3:decimal(9,2)>>'"
word = re.findall("`\w+`",str,) # match for columns
word = " ".join(word)
word = re.sub(r'\W+', '', word) # strip ``
columnnames = re.findall(r"(\w+):",str) # find all words before `:`
for c in columnnames:
c = re.sub(r'\W+', '', c) # to remove `:`
print "%s.%s," %( word,c)
输出:
col.columnname,
col.columnname1,
col.columnname2,
col.columnname3,
要读取文件,可以使用open(filename,mode)
方法
import re
with open("test.txt","r") as h:
str = h.read()
word = re.findall(r"`\w+`",str,)
word = " ".join(word)
word = re.sub(r'\W+', '', word)
columnnames = re.findall(r"(\w+):",str)
for c in columnnames:
c = re.sub(r'\W+', '', c)
print "%s.%s," %( word,c)
要写入文件:
import re
with open("test.txt","r") as h:
with open("output.dat","a") as w:
str = h.read()
word = re.findall(r"`\w+`",str,)
word = " ".join(word)
word = re.sub(r'\W+', '', word)
columnnames = re.findall(r"(\w+):",str)
for c in columnnames:
c = re.sub(r'\W+', '', c)
data = "%s.%s," %( word,c)
w.write(data+"\n")
w.close()
h.close()