Question

我有一个Web文本URL列表，我需要从中提取信息，然后将这些信息存储在列表中。我需要提取的字符串始终以（P：OR C：OR F :)开头，并始终以＆＃34 ;;＆＃34;结束。我很难一起完成这项工作，我们将非常感谢任何帮助。

来自其中一个网址的网络文字示例：

DR   Proteomes; UP000005640; Chromosome 3.
DR   Bgee; C9J872; -.
DR   ExpressionAtlas; C9J872; baseline and differential.
DR   GO; GO:0005634; C:nucleus; IBA:GO_Central.
DR   GO; GO:0005667; C:transcription factor complex; IEA:InterPro.
DR   GO; GO:0003677; F:DNA binding; IEA:UniProtKB-KW.
DR   GO; GO:0000981; F:sequence-specific DNA binding RNA polymerase II transcription factor activity; IBA:GO_Central.
DR   GO; GO:0003712; F:transcription cofactor activity; IEA:InterPro.
DR   GO; GO:0000278; P:mitotic cell cycle; IEA:InterPro.

这里是在C：

之后搜索的结果

['nucleus', 'transcription factor complex']

但它还需要浏览不同的URL并附加在同一列表中

迄今为止我尝试过的一个例子没有成功：

import urllib2
import sys
import re
IDlist = ['C9JVZ1', 'C9JLN0', 'C9J872']

URLlist = ["http://www.uniprot.org/uniprot/"+x+".txt" for x in IDlist]
function_list = []
for item in URLlist:
    textfile = urllib2.urlopen(item)
    myfile = textfile.read()
    for line in myfile:
        function = re.search('P:(.+?);', line).group(1)
        function_list.append(function)

Answer 1

这是一个包含词典的更新文件。请注意，我将循环控制更改为键入文件ID：该ID用作字典键。

fetchUsers: function() {
    this.$http.get('./api/v1/users/list', function(data, status, response) {
        this.globals.users = data;

        this.$nextTick(function () {
            var optionsData = this.$eval('globals.users.data | userList');
            console.log('optionsData', optionsData);
            $('.select2-users').select2({
                data: optionsData
            });
        }); 
    });
},

我从您的数据中获得的输出是

import urllib2
import re

IDlist = ['C9JVZ1', 'C9JLN0', 'C9J872']
function_dict = {}

# Cycle through the data files, keyed by ID
for id in IDlist:

    # Start a new list of functions for this file.
    # Open the file and read line by line.
    function_list = []
    textfile = urllib2.urlopen("http://www.uniprot.org/uniprot/"+id+".txt")
    myfile = textfile.readlines()

    for line in myfile:

        # When you find a function tag, extract the function and add it to the list.
        found = re.search(' [PCF]:(.+?);', line)
        if found:
            function = found.group(1)
            function_list.append(function)

    # At end of file, insert the list into the dictionary.
    function_dict[id] = function_list

print function_dict

Python：从URL

1 个答案: