使用正则表达式Python从文本中提取文件名

时间:2016-11-25 09:58:27

标签: python regex string

我正在尝试提取保存在python字符串变量中的源代码文件名。但是,变量包含html类型标记和许多其他内容,如下所示:

<p> Result = FAILURE<br/ hshreedharan : <a href="http://git-wip-
<ul>
<li>flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java</li>     
<li>flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestBucketWriter.java</li>
<li>flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java</li>
<li>sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java</li>
<li>sink.src.main.java.org.apache.flume.sink.hdfs.BucketWriter.java</li>          
</ul>

但是,我正在寻找正确的正则表达式,使用“re”python库忽略所有其他文本,html标记并仅提取输出作为变量中包含的源代码文件。

flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java
flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestBucketWriter.java
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java
sink.src.main.java.org.apache.flume.sink.hdfs.BucketWriter.java
sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java

目前,我使用以下代码:

  import re

  htmlText= \\ may be variable containing above code

  matchSrcFiles= re.findall('\\.[^.]*.java$', htmlText) \\text ending .java

帮助正确的正则表达式或函数修改,如re.sub提取相关源代码文件应该受到赞赏。

1 个答案:

答案 0 :(得分:1)

选中此项:import re a="""<p> Result = FAILURE<br/ hshreedharan : <a href="http://git-wip- <ul> <li>flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java</li> <li>flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestBucketWriter.java</li> <li>flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java</li> </ul> channel/src/main/java/org/apache/flume/channel/file/protoProtosFactory.java. sink.src.main.java.apache.flume.sink.java """ pat = "([a-zA-Z-.\/]+.java)" c = re.findall(pat,a) print c

['flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java', 'flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestBucketWriter.java', 'flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java', 'channel/src/main/java/org/apache/flume/channel/file/protoProtosFactory.java', 'sink.src.main.java.apache.flume.sink.java']

输出:

private void createDialogWithoutDateField() {
    DatePickerDialog dpd = new DatePickerDialog(this, null, 2014, 1, 24);
    try {
        java.lang.reflect.Field[] datePickerDialogFields = dpd.getClass().getDeclaredFields();
        for (java.lang.reflect.Field datePickerDialogField : datePickerDialogFields) {
            if (datePickerDialogField.getName().equals("mDatePicker")) {
                datePickerDialogField.setAccessible(true);
                DatePicker datePicker = (DatePicker) datePickerDialogField.get(dpd);
                java.lang.reflect.Field[] datePickerFields = datePickerDialogField.getType().getDeclaredFields();
                for (java.lang.reflect.Field datePickerField : datePickerFields) {
                    Log.i("test", datePickerField.getName());
                    if ("mDaySpinner".equals(datePickerField.getName())) {
                        datePickerField.setAccessible(true);
                        Object dayPicker = datePickerField.get(datePicker);
                        ((View) dayPicker).setVisibility(View.GONE);
                    }

                    if ("mMonthSpinner".equals(datePickerField.getName())) {
                        datePickerField.setAccessible(true);
                        Object dayPicker = datePickerField.get(datePicker);
                        ((View) dayPicker).setVisibility(View.GONE);
                    }
                }
            }
        }
    }
    catch (Exception ex) {
    }
    dpd.show();
}

Regex101演示:https://regex101.com/r/zzFpKJ/3