我有以下字符串(从.txt文件加载到Matlab单元格中):
text = 'u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.01, K=3.1416,
gamma=0.1, A=-0.1 u1 @ t=0.02, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.03,
K=3.1416, gamma=0.1, A=-0.1'
整个字符串变量很长(对于不同的参数值,从t = 0到t = 1)。我想将它们分成多个单元格,以便
A(1)='u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1'
, A(2)='u1 @ t=0.01, K=3.1416, gamma=0.1, A=-0.1'
,
等
甚至更好的是提取参数t
,K
,gamma
,A
的变量并将它们存储在数组中。
如何在Matlab中执行此操作? (或使用Python)
编辑:
结果是,我数据中的前几个条目采用... t=1E-4, ... t=2E-4, ...... t=9E-4, ... t=0.001
的形式,其中一些答案将跳过科学表示法中的前几个时间步。如何处理这些数字?
答案 0 :(得分:1)
您可以使用正则表达式。
获取数字的简单正则表达式是:'-?\d*\.?\d*'
。
要获取数据,可以使用此正则表达式。
'u1 @ t={0}, K={0}, gamma={0}, A={0}'.format('-?\d*\.?\d*')
示例:
>>> import re
>>> text = 'u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.01, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.02, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.03, K=3.1416, gamma=0.1, A=-0.1'
>>> r = '-?\d*\.?\d*'
>>> re.findall('u1 @ t={0}, K={0}, gamma={0}, A={0}'.format(r), text)
['u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1', 'u1 @ t=0.01, K=3.1416, gamma=0.1, A=-0.1', 'u1 @ t=0.02, K=3.1416, gamma=0.1, A=-0.1', 'u1 @ t=0.03, K=3.1416, gamma=0.1, A=-0.1']
答案 1 :(得分:1)
您可以使用re.split
分割文本。例如,您可以在每个空格后面加上“ u1”进行分割:
import re
from pprint import pprint
text = 'u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.01, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.02, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.03, K=3.1416, gamma=0.1, A=-0.1'
lines = re.split(r'\s+(?=u1)', text)
pprint(lines)
您得到:
['u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1',
'u1 @ t=0.01, K=3.1416, gamma=0.1, A=-0.1',
'u1 @ t=0.02, K=3.1416, gamma=0.1, A=-0.1',
'u1 @ t=0.03, K=3.1416, gamma=0.1, A=-0.1']
然后您可以解析此结果的每一行以提取属性:
for line in lines:
attrs = {}
for value in line[5:].split(", "):
k, v = value.split("=")
attrs[k] = float(v)
print(attrs)
你得到;
{'t': 0.0, 'K': 3.1416, 'gamma': 0.1, 'A': -0.1}
{'t': 0.01, 'K': 3.1416, 'gamma': 0.1, 'A': -0.1}
{'t': 0.02, 'K': 3.1416, 'gamma': 0.1, 'A': -0.1}
{'t': 0.03, 'K': 3.1416, 'gamma': 0.1, 'A': -0.1}
答案 2 :(得分:1)
没有regex
,变量将以浮点形式存储在2D数组中:
s = '''u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.01, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.02, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.03, K=3.1416, gamma=0.1, A=-0.1'''
out = []
for i in s.split('u1 @'):
if not i.strip():
continue
out += [[float(v.split('=')[-1]) for v in i.split(',')]]
from pprint import pprint
pprint(out)
打印:
[[0.0, 3.1416, 0.1, -0.1],
[0.01, 3.1416, 0.1, -0.1],
[0.02, 3.1416, 0.1, -0.1],
[0.03, 3.1416, 0.1, -0.1]]
答案 3 :(得分:1)
尝试一下:
def to_cells(string):
strings = list(filter(None, string.split('u1 @ ')))
cells = {}
for cell in strings:
pairs = cell.split(',')
for pair in pairs:
k, v = pair.split('=')
k = k.strip()
v = float(v)
if k in cells:
cells[k].append(v)
else:
cells[k] = [v]
return cells
您可以使用以下功能:
res = to_cells(
'u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.01, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.02, K=3.1416,'
' gamma=0.1, A=-0.1 u1 @ t=0.03, K=3.1416, gamma=0.1, A=-0.1'
)
for x in res:
print(x, '\t====>\t', res[x])
输出将如下所示:
t ====> [0.0, 0.01, 0.02, 0.03]
K ====> [3.1416, 3.1416, 3.1416, 3.1416]
gamma ====> [0.1, 0.1, 0.1, 0.1]
A ====> [-0.1, -0.1, -0.1, -0.1]
希望这会有所帮助:)
答案 4 :(得分:1)
您已经获得了很多Python答案,因此这是一个MATLAB。您可以使用函数regexp
来解析字符串,然后使用vertcat
,cellfun
和str2double
来重整形并将结果的字符串单元格数组转换为N-by- 4个值矩阵。从此示例数据开始(一个字符串中包含4组条目):
str = 'u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.01, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.02, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.03, K=3.1416, gamma=0.1, A=-0.1';
代码仅2行:
vals = regexp(str, 'u1 @ t=([-\.\dE]+), K=([-\.\dE]+), gamma=([-\.\dE]+), A=([-\.\dE]+)', 'tokens');
vals = cellfun(@str2double, vertcat(vals{:}));
结果:
vals =
0 3.141600000000000 0.100000000000000 -0.100000000000000
0.010000000000000 3.141600000000000 0.100000000000000 -0.100000000000000
0.020000000000000 3.141600000000000 0.100000000000000 -0.100000000000000
0.030000000000000 3.141600000000000 0.100000000000000 -0.100000000000000
每个列均包含t
,K
,gamma
和A
的值。
答案 5 :(得分:0)
另一种基于正则表达式的Matlab解决方案。用s
v = reshape(str2double(regexp([s ' '], '(?<=(t|K|gamma|A)=).+?(?=,| )', 'match')), 4, []).';
在您的示例中,给出
v =
0 3.141600000000000 0.100000000000000 -0.100000000000000
0.010000000000000 3.141600000000000 0.100000000000000 -0.100000000000000
0.020000000000000 3.141600000000000 0.100000000000000 -0.100000000000000
0.030000000000000 3.141600000000000 0.100000000000000 -0.100000000000000
正则表达式匹配以t=
,K=
等开头,后跟逗号或空格的任何内容。在字符串的末尾添加一个空格,以便可以找到最后一个匹配项。 str2double
将匹配的子字符串转换为数字(如果可能)。然后使用一些转置和重塑将结果排列为矩阵,其中四个变量中的每一个都是一列。
答案 6 :(得分:0)
这是在MATLAB中执行此操作的基本方法:
text ='u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.01, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.02, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.03, K=3.1416, gamma=0.1, A=-0.1';
a=size(text);
pos1=strfind(text, 'u1'); % first position of variable
pos2=strfind(text, 'A='); % position of 'A='
pos2=pos2+5; % since after 'A=' there are five spaces until the end of the desired variable
vars=length(char); % number of new variables within 'text'
for i=1:length(pos2)
output{i}=text(pos1(i):pos2(i)); % output variable as cell with entries as new variables
end
它找到所需变量的第一个和最后一个字符的位置,并将其从“文本”中删除。我建议采用类似的方法来提取单个变量名称和值的文本。我会在“ =”符号上使用“ strfind”功能。
答案 7 :(得分:0)
我认为,无需MATLAB中的regexp,可以更轻松地完成此操作。使用字符串代替char也有帮助。
result = extractAfter(text,'u1 @ ');
result = split(result, 'u1 @ ');
result = split(result, ',');
result = extractAfter(result,'=');
result = double(result);
这可能是迄今为止最快的解决方案。
>> testFunc
Elapsed time is 0.075453 seconds. % My solution
Elapsed time is 2.820094 seconds. % Luis Mendo solution
function testFunc()
text = "u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.01, K=3.1416, " + ...
"gamma=0.1, A=-0.1 u1 @ t=0.02, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.03, " + ...
"K=3.1416, gamma=0.1, A=-0.1";
% My solution
tic
for i = 1:1e4
result = extractAfter(text,'u1 @ ');
result = split(result, 'u1 @ ');
result = split(result, ',');
result = extractAfter(result,'=');
result = double(result);
end
toc
% Luis Mendo solution
tic;
for i = 1:1e4
result = reshape(str2double(regexp(text + ' ', '(?<=(t|K|gamma|A)=).+?(?=,| )', 'match')), 4, [])';
end
toc