Question

我正在尝试从html文件中读取数据数据由<PRE></PRE>代码

分隔

e.g：

<pre>
12.0  29132  -60.3  -91.4      1   0.01    260         753.2  753.3  753.2
10.0  30260  -57.9             1   0.01    260     58  802.4  802.5  802.4
 9.8  30387  -57.7  -89.7      1   0.01    261     61  807.8  807.9  807.8
 6.0  33631  -40.4  -77.4      1   0.17    260     88 1004.0 1006.5 1004.1
 5.9  33746  -40.3  -77.3      1   0.17               1009.2 1011.8 1009.3
</pre>

t = regexp(html, '<PRE[^>]*>(.*?)</PRE>', 'tokens');

其中t是char的单元格

好吧，现在我要用NaN替换空格并获得：

12.0  29132  -60.3  -91.4      1   0.01    260    Nan  753.2  753.3  753.2
10.0  30260  -57.9   Nan       1   0.01    260     58  802.4  802.5  802.4
 9.8  30387  -57.7  -89.7      1   0.01    261     61  807.8  807.9  807.8
 6.0  33631  -40.4  -77.4      1   0.17    260     88 1004.0 1006.5 1004.1
 5.9  33746  -40.3  -77.3      1   0.17    NaN    NaN 1009.2 1011.8 1009.3

此数据将保存在mydata.dat文件

中

Answer 1

如果你在某处托管了HTML文件，那么：

url = 'http://www.myDomain.com/myFile.html';
html = urlread(url);
% Use regular expressions to remove undesired HTML markup.
txt = regexprep(html,'<script.*?/script>','');
txt = regexprep(txt,'<style.*?/style>','');
txt = regexprep(txt,'<pre.*?/pre>','');
txt = regexprep(txt,'<.*?>','')

现在，您应该在txt变量中以文本格式显示日期。您可以使用textscan来解析txt var，您可以扫描空格或数字。

更多信息： - urlread - regexprep

Answer 2

这不是一个完美的解决方案，但似乎可以帮助你。

假设t是一个长字符串，分隔符是空格，并且您知道列数：

numcols = 7;
sample = '1  2  3  4  5    7  1    3    5    7';

test = textscan(sample,'%f','delimiter',' ','MultipleDelimsAsOne',false);
test = test{:}; % Pull the double out of the cell array
test(2:2:end) = []; % Dump out extra NaNs
test2 = reshape(test,numcols,length(test)/numcols)'; % Have to mess with it a little to reshape rowwise instead of columnwise

返回：

test2 =

     1     2     3     4     5   NaN     7
     1   NaN     3   NaN     5   NaN     7

这假设分隔符是空格并且是常量。 Textscan不允许您将空格堆叠为分隔符，因此如果没有数据存在，它会在每个空白字符后抛出NaN。在您的示例数据中，每个数据点之间有两个空格字符，因此每个其他NaN（或者更一般地说，n_whitespace - 1）都可以被抛出，从而为您留下您真正想要的NaN。

MATLAB：如何读取PRE标签并使用NaN创建cellarray

2 个答案: