python re module - 用于提取文本片段的正则表达式

时间:2010-10-24 01:16:42

标签: python regex

我的文字显示了学生所选课程的课程编号,姓名,成绩和其他信息。具体来说,这些行看起来像这样:

0301 453  20071 LINEAR SYSTEMS I                    A    4   4    16.0

0301 481  20071 ELECTRONICS I WITH LAB              A    4   4    16.0

0301 481  20084 ELECTRONICS II WITH LAB      RE     B    4   4    12.0

0301 713  20091 SOLID STATE PHYSICS          NG          0   0     0.0

0511 454  20074 INT'L TRADE & FINANCE               B    4   4    12.0

我想写一个提取的正则表达式:

LINEAR SYSTEMS I
ELECTRONICS I WITH LAB
ELECTRONICS II WITH LAB
SOLID STATE PHYSICS
INT'L TRADE & FINANCE

我写了以下

pattCourseName = re.compile(r'([-/&A-Z\':\s]{2,})(\s+[A-Z])')

然而,这给了我

LINEAR SYSTEMS I
ELECTRONICS I WITH LAB
ELECTRONICS II WITH LAB      RE
SOLID STATE PHYSICS
INT'L TRADE & FINANCE

也就是说,我无法摆脱RE部分。

有人可以帮忙吗?谢谢!

2 个答案:

答案 0 :(得分:5)

如果在显示时修改了布局,那么请忘记正则表达式,然后抓住所需的列:

course_name = line[16:45].strip()

答案 1 :(得分:2)

for line in open("file"):
    s=filter(None,line.split(" ",4))
    print s[3].replace("  ","|").split("|",1)[0]

输出

$ python myscript.py
LINEAR SYSTEMS I
ELECTRONICS I WITH LAB
ELECTRONICS II WITH LAB
SOLID STATE PHYSICS
INT'L TRADE & FINANCE