我的文字显示了学生所选课程的课程编号,姓名,成绩和其他信息。具体来说,这些行看起来像这样:
0301 453 20071 LINEAR SYSTEMS I A 4 4 16.0
0301 481 20071 ELECTRONICS I WITH LAB A 4 4 16.0
0301 481 20084 ELECTRONICS II WITH LAB RE B 4 4 12.0
0301 713 20091 SOLID STATE PHYSICS NG 0 0 0.0
0511 454 20074 INT'L TRADE & FINANCE B 4 4 12.0
我想写一个提取的正则表达式:
LINEAR SYSTEMS I
ELECTRONICS I WITH LAB
ELECTRONICS II WITH LAB
SOLID STATE PHYSICS
INT'L TRADE & FINANCE
我写了以下
pattCourseName = re.compile(r'([-/&A-Z\':\s]{2,})(\s+[A-Z])')
然而,这给了我
LINEAR SYSTEMS I
ELECTRONICS I WITH LAB
ELECTRONICS II WITH LAB RE
SOLID STATE PHYSICS
INT'L TRADE & FINANCE
也就是说,我无法摆脱RE部分。
有人可以帮忙吗?谢谢!
答案 0 :(得分:5)
如果在显示时修改了布局,那么请忘记正则表达式,然后抓住所需的列:
course_name = line[16:45].strip()
答案 1 :(得分:2)
for line in open("file"):
s=filter(None,line.split(" ",4))
print s[3].replace(" ","|").split("|",1)[0]
输出
$ python myscript.py
LINEAR SYSTEMS I
ELECTRONICS I WITH LAB
ELECTRONICS II WITH LAB
SOLID STATE PHYSICS
INT'L TRADE & FINANCE