将某些文本放入matlab中的向量中

时间:2012-10-23 09:41:16

标签: matlab character

  

可能重复:
  finding a specific data from a text file in matlab

我已经打开了名为'gos.txt'的文本文件 使用以下代码:

s={}; 
fid = fopen('gos.txt'); 
tline = fgetl(fid); 
while ischar(tline) 
   s=[s;tline]; 
   tline = fgetl(fid); 
end

我得到的结果如下:  s =

'[Term]'    
'id: GO:0008150'
'name: biological_process'
'namespace: biological_process'
'alt_id: GO:0000004'
'alt_id: GO:0007582'
[1x243 char]
[1x445 char]
'subset: goslim_aspergillus'
'subset: goslim_candida'
'subset: goslim_yeast'
'subset: gosubset_prok'
'synonym: "biological process" EXACT []'
'synonym: "biological process unknown" NARROW []'
'synonym: "physiological process" EXACT []'
'xref: Wikipedia:Biological_process'
'[Term]'    
'id: GO:0016740'
'name: transferase activity'
'namespace: molecular_function'
[1x326 char]
'subset: goslim_aspergillus'
'subset: goslim_candida'
'subset: goslim_metagenomics'
'subset: goslim_pir'
'subset: goslim_plant'
'subset: gosubset_prok'
'xref: EC:2'
'xref: Reactome:REACT_25050 "Molybdenum ion transfer onto molybdopterin, Homo sapiens"'
'//is_a: GO:0003674 ! molecular_function'
'is_a: GO:0008150 ! molecular_function (added by Zaid, To be Removed Later)'
'//relationship: part_of GO:0008150 ! biological_process'
'[Term]'    
'id: GO:0016787'
'name: hydrolase activity'
'namespace: molecular_function'
[1x186 char]
'subset: goslim_aspergillus'
'subset: goslim_candida'
'subset: goslim_metagenomics'
'subset: goslim_plant'
'subset: gosubset_prok'
'xref: EC:3'
'//is_a: GO:0003674 ! molecular_function'
'is_a: GO:0016740 ! molecular_function (added by Zaid, to be removed later)'
'relationship: part_of GO:0008150 ! biological_process'
'[Term]'    
'id: GO:0006810'
'name: transport'
'namespace: biological_process'
'alt_id: GO:0015457'
'alt_id: GO:0015460'
[1x255 char]
'subset: goslim_aspergillus'
'subset: goslim_candida'
'synonym: "small molecule transport" NARROW []'
'synonym: "solute:solute exchange" NARROW []'
'synonym: "transport accessory protein activity" RELATED [GOC:mah]'
'is_a: GO:0016787 ! biological_process'
'relationship: part_of GO:0008150 ! biological_process'
.
.
.
.    

后面的步骤是如何采取某个特征并将其放在一个向量中...例如:我想把所有行包含'id:GO:*******'并将它们放在一个向量中,我也希望得到'is_a:GO:*******'到一个向量,请注意我不想在同一行之后的字符。

2 个答案:

答案 0 :(得分:6)

您可以在此轻松使用regexp - 它适用于单元格:

matching_lines = s{~cellfun('isempty', regexp(s, '^id: GO'))}

ans =

 id: GO:0008150

ans =

 id: GO:0016740

提取以id: GO开头的所有行。仅cellfun调用会为您提供0/1的向量,其中1表示s中的字符串与您的查询匹配。

类似的行找到包含is_a: GO:的行。使用regexp也可以从字符串中删除不必要的字符。

可以使用'tokens'的{​​{1}}参数提取部分字符串:

regexp

答案 1 :(得分:1)

假设你只想在一行的开头找到东西,那很简单:

found=[]
for i=1:length(s)
    temp = s{i};
    if strcmp('id: GO:',temp(1:min(7,end));
        found = [found i];
    end
end

现在找到包含一个向量,其中包含以id开头的所有字符串位置:GO:

我目前无法在Matlab中尝试,但这应该是正确的。