Python-输入文件中出现字符串的所有行和行号

时间:2018-10-27 13:25:19

标签: python regex python-3.x file keyword

我想打印输入文件中出现字符串的所有行以及行号。到目前为止,我编写了如下所示的代码。它正在工作,但不是我想要的方式:

def index(filepath, keyword):

    with open(filepath) as f:
        for lineno, line in enumerate(f, start=1):
            matches = [k for k in keyword if k in line]
            if matches:
                result = "{:<15} {}".format(','.join(matches), lineno)
                print(result)
                print (line)

index('deneme.txt', ['elma'])

输出如下:

elma            15
Sogan+Noun ,+Punc domates+Noun ,+Punc patates+Noun ,+Punc elma+Noun ve+Conj turunçgil+Noun+A3pl ihracat+Noun+P3sg+Dat devlet+Noun destek+Noun+P3sg ver+Verb+Pass+Prog2+Cop .+Punc  

到目前为止,还不错,但是当我输入类似"Sog"的关键字时,它也会找到Sogan,但是我不想要那样,我只想检查空白之间的标记。我想我需要为此编写正则表达式,但我得到了一个,但现在无法将该正则表达式添加到此代码中。

r'[\w+]+'

3 个答案:

答案 0 :(得分:1)

您可以使用以下正则表达式:

import re

lines = [
    'Sogan+Noun ,+Punc domates+Noun ,+Punc patates+Noun ,+Punc elma+Noun ve+Conj turunçgil+Noun+A3pl ihracat+Noun+P3sg+Dat devlet+Noun destek+Noun+P3sg ver+Verb+Pass+Prog2+Cop .+Punc',
    'Sog+Noun ,+Punc domates+Noun ,+Punc patates+Noun ,+Punc elma+Noun ve+Conj turunçgil+Noun+A3pl ihracat+Noun+P3sg+Dat devlet+Noun destek+Noun+P3sg ver+Verb+Pass+Prog2+Cop .+Punc',
]

keywords = ['Sog']
pattern = re.compile('(\w+)\+')

for lineno, line in enumerate(lines):
    words = set(m.group(1) for m in pattern.finditer(line))  # convert to set for efficiency
    matches = [keyword for keyword in keywords if keyword in words]
    if matches:
        result = "{:<15} {}".format(','.join(matches), lineno)
        print(result)
        print(line)

输出

Sog             1
Sog+Noun ,+Punc domates+Noun ,+Punc patates+Noun ,+Punc elma+Noun ve+Conj turunçgil+Noun+A3pl ihracat+Noun+P3sg+Dat devlet+Noun destek+Noun+P3sg ver+Verb+Pass+Prog2+Cop .+Punc

说明

模式'(\w+)\+'的任何一组字母后跟一个+字符,+是特殊字符,因此您必须对其进行转义以进行匹配。然后使用group提取匹配的组(即字母组)。

进一步

  1. 正则表达式syntax

答案 1 :(得分:1)

您可能要使用单词边界标记\b。这是\w\W之间过渡的空匹配。如果您希望关键字是文字字符串,则必须首先escape。您可以使用|将所有内容组合到一个正则表达式中:

pattern = re.compile(r'\b(' + '|'.join(map(re.escape, keyword)) + r')\b')

OR

pattern = re.compile(r'\b(?' + '|'.join(re.escape(k) for k in keyword) + r')\b')

现在,计算比赛要容易一些,因为您可以使用finditer而不是自己进行理解:

matches = pattern.finditer(line)

由于每个匹配项都包含在一个组中,因此打印并不困难:

result = "{:<15} {}".format(','.join(m.group() for m in matches), lineno)

OR

result = "{:<15} {}".format(','.join(map(re.Match.group(), matches)), lineno)

当然,不要忘记

import re

拐角案例

如果您的关键字彼此是同一前缀的子集,请确保较长的关键字排在前。例如,如果您有

keyword = ['foo', 'foobar']

正则表达式将

\b(foo|foobar)\b

当您遇到其中有foobar的行时,foo将与之成功匹配,然后对\b'. This is documented behavior of ||将失败。解决方案是在构造表达式之前,通过减小长度来对所有关键字进行预排序:

keywords.sort(key=len, reversed=True)

或者,如果可以使用非列表输入:

keywords = sorted(keywords, key=len, reversed=True)

如果您不喜欢此顺序,则始终可以在匹配后按其他顺序打印它们。

答案 2 :(得分:1)

  

问题:类似“ Sog”的关键字,它也可以找到Sogan...。我只希望空格之间有标记。 ...我如何将该正则表达式添加到此代码中。

使用<?xml version="1.0" encoding="utf-8"?> <android.support.v4.widget.DrawerLayout xmlns:android="http://schemas.android.com/apk/res/android" xmlns:app="http://schemas.android.com/apk/res-auto" android:layout_width="match_parent" android:layout_height="match_parent" android:background="@color/bluetwo" android:fitsSystemWindows="true" android:id="@+id/drawer_layout"> <RelativeLayout android:layout_width="wrap_content" android:layout_height="wrap_content"> <include android:id="@+id/tbar" layout="@layout/my_toolbar" android:layout_width="match_parent" android:layout_height="?attr/actionBarSize" android:layout_alignParentStart="true" android:layout_alignParentTop="true" android:layout_alignParentLeft="true" /> </RelativeLayout> <RelativeLayout android:layout_width="wrap_content" android:layout_height="wrap_content"> <Button android:id="@+id/equels" android:layout_width="101dp" android:layout_height="183dp" android:layout_alignBottom="@id/point" android:layout_alignParentEnd="true" android:layout_alignParentRight="true" android:background="@drawable/ic_button_equals_orange" /> <Button android:id="@+id/eight" android:layout_width="102dp" android:layout_height="91dp" android:layout_toRightOf="@id/seven" android:layout_toEndOf="@id/seven" android:layout_alignBottom="@id/seven" android:background="@drawable/ic_button_eight_blue" android:layout_marginLeft="1dp" android:layout_marginStart="1dp" /> <Button android:id="@+id/nine" android:layout_width="102dp" android:layout_height="91dp" android:layout_alignBottom="@id/eight" android:layout_marginStart="1dp" android:layout_toEndOf="@id/eight" android:layout_toRightOf="@id/eight" android:background="@drawable/ic_button_nine_blue" android:layout_marginLeft="1dp" /> <Button android:id="@+id/times" android:layout_width="102dp" android:layout_height="91dp" android:layout_toEndOf="@id/persent" android:layout_toRightOf="@id/persent" android:layout_alignBottom="@id/persent" android:layout_marginLeft="1dp" android:background="@drawable/ic_button_times_blue" android:layout_marginStart="1dp" /> <Button android:id="@+id/devide" android:layout_width="102dp" android:layout_height="91dp" android:layout_toRightOf="@id/times" android:layout_toEndOf="@id/times" android:layout_alignBottom="@id/times" android:layout_marginLeft="1dp" android:layout_marginStart="1dp" android:background="@drawable/ic_button_devide_blue" /> <TextView android:id="@+id/ValueTextBox" android:layout_width="422dp" android:layout_height="145dp" android:layout_alignBottom="@id/back" android:layout_marginBottom="90dp" android:background="@color/bluesemi" android:gravity="center_horizontal|center|end" android:padding="6dp" android:textSize="50sp" app:fontFamily="@font/zonaprothin" android:textColor="@color/bluetwo"/> <Button android:id="@+id/back" android:layout_width="102dp" android:layout_height="91dp" android:layout_alignBottom="@id/seven" android:layout_marginBottom="92dp" android:background="@drawable/ic_button_back_space_blue"/> <Button android:id="@+id/persent" android:layout_width="102dp" android:layout_height="91dp" android:layout_toEndOf="@id/back" android:layout_toRightOf="@id/back" android:layout_alignBottom="@id/back" android:layout_marginLeft="1dp" android:layout_marginStart="1dp" android:background="@drawable/ic_button_persentage_blue" /> <Button android:id="@+id/four" android:layout_width="102dp" android:layout_height="91dp" android:layout_alignBottom="@+id/one" android:layout_marginBottom="92dp" android:background="@drawable/ic_button_four_blue" /> <Button android:id="@+id/five" android:layout_width="102dp" android:layout_height="91dp" android:layout_alignStart="@+id/four" android:layout_alignLeft="@+id/four" android:layout_alignBottom="@id/four" android:layout_marginStart="103.5dp" android:layout_marginLeft="103.5dp" android:layout_marginBottom="0dp" android:background="@drawable/ic_button_five_blue" /> <Button android:id="@+id/six" android:layout_alignBottom="@id/five" android:layout_width="102dp" android:layout_height="91dp" android:layout_marginLeft="207dp" android:layout_marginStart="207dp" android:background="@drawable/ic_button_six_blue"/> <Button android:id="@+id/one" android:layout_width="102dp" android:layout_height="91dp" android:layout_alignTop="@id/zero" android:layout_alignParentStart="true" android:layout_alignParentLeft="true" android:layout_marginTop="-92dp" android:background="@drawable/ic_button_one_blue" /> <Button android:id="@+id/two" android:layout_width="102dp" android:layout_height="91dp" android:layout_alignBottom="@id/one" android:layout_marginStart="1dp" android:layout_marginLeft="1dp" android:layout_toEndOf="@id/one" android:layout_toRightOf="@id/one" android:background="@drawable/ic_button_two_blue" /> <Button android:id="@+id/three" android:layout_width="102dp" android:layout_height="91dp" android:layout_toEndOf="@id/two" android:layout_toRightOf="@id/two" android:layout_alignBottom="@id/two" android:layout_marginLeft="1dp" android:layout_marginStart="1dp" android:background="@drawable/ic_button_three_blue" /> <Button android:id="@+id/plus" android:layout_width="102dp" android:layout_height="91dp" android:layout_above="@+id/equels" android:layout_marginStart="1dp" android:layout_marginLeft="1dp" android:layout_toEndOf="@id/six" android:layout_toRightOf="@id/six" android:layout_alignBottom="@+id/six" android:background="@drawable/ic_button_plus_blue" /> <Button android:id="@+id/minus" android:layout_width="102dp" android:layout_height="91dp" android:layout_toEndOf="@id/nine" android:layout_toRightOf="@id/nine" android:layout_alignBottom="@id/nine" android:layout_marginLeft="1dp" android:layout_marginStart="1dp" android:background="@drawable/ic_button_minus_blue" /> <Button android:id="@+id/seven" android:layout_width="102dp" android:layout_height="91dp" android:layout_alignBottom="@id/four" android:layout_marginBottom="92dp" android:background="@drawable/ic_button_seven_blue" /> <Button android:id="@+id/point" android:layout_width="102dp" android:layout_height="91dp" android:layout_above="@+id/zero" android:layout_alignBottom="@id/zero" android:layout_toEndOf="@id/zero" android:layout_toRightOf="@id/zero" /> <Button android:id="@+id/zero" android:layout_width="205dp" android:layout_height="91dp" android:layout_alignParentStart="true" android:layout_alignParentLeft="true" android:layout_alignParentBottom="true" android:layout_marginStart="0dp" android:layout_marginLeft="0dp" android:layout_marginEnd="1dp" android:layout_marginRight="1dp" android:layout_marginBottom="0dp" android:background="@drawable/ic_button_zero_blue" /> </RelativeLayout> <android.support.design.widget.NavigationView android:id="@+id/nav_view" android:layout_width="wrap_content" android:layout_height="match_parent" android:layout_gravity="start" android:fitsSystemWindows="true" app:headerLayout="@layout/nav_header" app:menu="@menu/draw_items" /> </android.support.v4.widget.DrawerLayout> 构建regex,并使用keywords分隔符来表示多个or |

例如:

keywords
  

输出

import re

def index(lines, keyword):
    rc = re.compile(".*?(({})\+.+?\s)".format(keyword))

    for i, line in enumerate(lines):
        match = rc.match(line)
        if match:
            print("lines[{}] match:{}\n{}".format(i, match.groups(), line))

if __name__ == "__main__":
    lines = [
    'Sogan+Noun ,+Punc domates+Noun ,+Punc patates+Noun ,+Punc elmaro+Noun ve+Conj ... (omitted for brevity)',
    'Sog+Noun ,+Punc domates+Noun ,+Punc patates+Noun ,+Punc elma+Noun ve+Conj ... (omitted for brevity)',
]
    index(lines, 'elma')
    index(lines, 'Sog|elma')

使用Python测试:3.5