如何在包含数字和字母字符的行中拆分某些字符串。
我拥有的数据是这样的(tembin-data.dat
):
['3317121918', '69N1345E', '15']
['3317122000', '72N1337E', '20']
['3317122006', '75N1330E', '20']
['3317122012', '78N1321E', '20']
['3317122018', '83N1310E', '25']
.......etc
我需要删除"N"
和"E"
这样的新数据安排:
['3317121918', '69','1345','15']
['3317122000', '72','1337','20']
['3317122006', '75','1330','20']
['3317122012', '78','1321','20']
['3317122018', '83','1310','25']
.......etc
我目前使用的Python脚本是这样的:
newfile = open('tembin-data.dat', 'w')
with open('tembin4.dat', 'r') as inF:
for line in inF:
myString = '331712'
if myString in line:
data=line.split()
print data
newfile.write("%s\n" % data)
newfile.close()
tembin4.dat
如下:
REMARKS:
230900Z POSITION NEAR 7.8N 118.6E.
TROPICAL STORM 33W (TEMBIN), LOCATED APPROXIMATELY 769 NM EAST-
SOUTHEAST OF HO CHI MINH CITY, VIETNAM, HAS TRACKED WESTWARD AT
11 KNOTS OVER THE PAST SIX HOURS. MAXIMUM SIGNIFICANT WAVE HEIGHT
AT 230600Z IS 14 FEET. NEXT WARNINGS AT 231500Z, 232100Z, 240300Z
AND 240900Z.//
3317121918 69N1345E 15
3317122000 72N1337E 20
3317122006 75N1330E 20
3317122012 78N1321E 20
3317122018 83N1310E 25
3317122100 86N1295E 35
3317122106 85N1284E 35
3317122112 84N1276E 40
3317122118 79N1267E 50
3317122118 79N1267E 50
3317122200 78N1256E 45
3317122206 78N1236E 45
3317122212 80N1225E 45
3317122218 79N1214E 50
3317122218 79N1214E 50
3317122300 77N1204E 55
3317122300 77N1204E 55
3317122306 77N1193E 55
3317122306 77N1193E 55
NNNN
答案 0 :(得分:2)
试试这个:
import re
for line in open(r"tembin4.txt","r"):
lst = line.split(" ")
for i,x in enumerate(lst):
grp = re.findall('(\d+)N(\d+)E',x)
if len(grp) !=0:
lst.remove(x)
lst.insert(i,grp[0][1])
lst.insert(i,grp[0][0])
print(" ".join(lst))
答案 1 :(得分:2)
只需使用正则表达式和拆分扩展您的方法。
import re
newfile = open('tembin-data.dat', 'w')
pat = re.compile("[N|E]")
with open('tembin4.dat', 'r') as inF:
for line in inF:
myString = '331712'
if myString in line:
data=line.split()
data[2:2] = pat.split(data[1])[:-1] # insert the list flattend at index 2
del data[1] # Remove string with N&E from list.
print data
newfile.write("%s\n" % data)
答案 2 :(得分:2)
您可以使用
Positive Lookbehind (?<=N)
和Positive Lookahead(?=N)
并抓取该群组:
import re
pattern="'\d+'|(\d+)(?=N)|(?<=N)(\d+)"
with open('file.txt','r') as f:
for line in f:
sub_list=[]
search=re.finditer(pattern,line)
for lin in search:
sub_list.append(int(lin.group().strip("'")))
if sub_list:
print(sub_list)
输出:
[3317121918, 69, 1345, 15]
[3317122000, 72, 1337, 20]
[3317122006, 75, 1330, 20]
[3317122012, 78, 1321, 20]
[3317122018, 83, 1310, 25]
正则表达式信息:
'\d+'|(\d+)(?=N)|(?<=N)(\d+)/g'
\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed
Positive Lookahead (?=N)
Assert that the Regex below matches
N matches the character N literally (case sensitive)
Positive Lookbehind (?<=N)
Assert that the Regex below matches
N matches the character N literally (case sensitive)
答案 3 :(得分:1)
使用pandas,您可以轻松完成此任务。
import pandas as pd
import os # optional
os.chdir('C:\\Users') # optional
df = pd.read_csv('tembin-data.dat', header = None)
df[3]= df[1].str.slice(1,3)
df[4]= df[1].str.slice(4,8)
df = df.drop(1, axis = 1)
df.to_csv('tembin-out.dat',header=False)
答案 4 :(得分:1)
您可以在Python3中尝试这个简短的解决方案:
import re
s = [['3317121918', '69N1345E', '15'], ['3317122000', '72N1337E', '20'], ['3317122006', '75N1330E', '20'], ['3317122012', '78N1321E', '20'],
['3317122018', '83N1310E', '25']]
new_s = [[a, *re.findall('\d+', b), c] for a, b, c in s]
输出:
[['3317121918', '69', '1345', '15'], ['3317122000', '72', '1337', '20'], ['3317122006', '75', '1330', '20'], ['3317122012', '78', '1321', '20'], ['3317122018', '83', '1310', '25']]