从一个大字符串中提取参数值

时间:2019-07-10 20:53:39

标签: python string matlab text cell

我有以下字符串(从.txt文件加载到Matlab单元格中):

text = 'u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.01, K=3.1416, 
gamma=0.1, A=-0.1 u1 @ t=0.02, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.03, 
K=3.1416, gamma=0.1, A=-0.1'

整个字符串变量很长(对于不同的参数值,从t = 0到t = 1)。我想将它们分成多个单元格,以便

  • A(1)='u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1'
  • A(2)='u1 @ t=0.01, K=3.1416, gamma=0.1, A=-0.1'

  • 甚至更好的是提取参数tKgammaA的变量并将它们存储在数组中。

如何在Matlab中执行此操作? (或使用Python)

编辑:

结果是,我数据中的前几个条目采用... t=1E-4, ... t=2E-4, ...... t=9E-4, ... t=0.001的形式,其中一些答案将跳过科学表示法中的前几个时间步。如何处理这些数字?

8 个答案:

答案 0 :(得分:1)

您可以使用正则表达式。

获取数字的简单正则表达式是:'-?\d*\.?\d*'

要获取数据,可以使用此正则表达式。

'u1 @ t={0}, K={0}, gamma={0}, A={0}'.format('-?\d*\.?\d*')

示例:

>>> import re
>>> text = 'u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.01, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.02, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.03, K=3.1416, gamma=0.1, A=-0.1'
>>> r = '-?\d*\.?\d*'
>>> re.findall('u1 @ t={0}, K={0}, gamma={0}, A={0}'.format(r), text)
['u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1', 'u1 @ t=0.01, K=3.1416, gamma=0.1, A=-0.1', 'u1 @ t=0.02, K=3.1416, gamma=0.1, A=-0.1', 'u1 @ t=0.03, K=3.1416, gamma=0.1, A=-0.1']

答案 1 :(得分:1)

您可以使用re.split分割文本。例如,您可以在每个空格后面加上“ u1”进行分割:

import re
from pprint import pprint

text = 'u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.01, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.02, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.03, K=3.1416, gamma=0.1, A=-0.1'

lines = re.split(r'\s+(?=u1)', text)
pprint(lines)

您得到:

['u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1',
 'u1 @ t=0.01, K=3.1416, gamma=0.1, A=-0.1',
 'u1 @ t=0.02, K=3.1416, gamma=0.1, A=-0.1',
 'u1 @ t=0.03, K=3.1416, gamma=0.1, A=-0.1']

然后您可以解析此结果的每一行以提取属性:

for line in lines:
    attrs = {}
    for value in line[5:].split(", "):
        k, v = value.split("=")
        attrs[k] = float(v)
    print(attrs)

你得到;

{'t': 0.0, 'K': 3.1416, 'gamma': 0.1, 'A': -0.1}
{'t': 0.01, 'K': 3.1416, 'gamma': 0.1, 'A': -0.1}
{'t': 0.02, 'K': 3.1416, 'gamma': 0.1, 'A': -0.1}
{'t': 0.03, 'K': 3.1416, 'gamma': 0.1, 'A': -0.1}

答案 2 :(得分:1)

没有regex,变量将以浮点形式存储在2D数组中:

s = '''u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.01, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.02, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.03, K=3.1416, gamma=0.1, A=-0.1'''

out = []
for i in s.split('u1 @'):
    if not i.strip():
        continue
    out += [[float(v.split('=')[-1]) for v in i.split(',')]]

from pprint import pprint
pprint(out)

打印:

[[0.0, 3.1416, 0.1, -0.1],
 [0.01, 3.1416, 0.1, -0.1],
 [0.02, 3.1416, 0.1, -0.1],
 [0.03, 3.1416, 0.1, -0.1]]

答案 3 :(得分:1)

尝试一下:

def to_cells(string):
strings = list(filter(None, string.split('u1 @ ')))
cells = {}

for cell in strings:
    pairs = cell.split(',')
    for pair in pairs:
        k, v = pair.split('=')
        k = k.strip()
        v = float(v)
        if k in cells:
            cells[k].append(v)
        else:
            cells[k] = [v]

return cells

您可以使用以下功能:

res = to_cells(
    'u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.01, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.02, K=3.1416,'
    ' gamma=0.1, A=-0.1 u1 @ t=0.03, K=3.1416, gamma=0.1, A=-0.1'
)

for x in res:
    print(x, '\t====>\t', res[x])

输出将如下所示:

t   ====>    [0.0, 0.01, 0.02, 0.03]
K   ====>    [3.1416, 3.1416, 3.1416, 3.1416]
gamma   ====>    [0.1, 0.1, 0.1, 0.1]
A   ====>    [-0.1, -0.1, -0.1, -0.1]

希望这会有所帮助:)

答案 4 :(得分:1)

您已经获得了很多Python答案,因此这是一个MATLAB。您可以使用函数regexp来解析字符串,然后使用vertcatcellfunstr2double来重整形并将结果的字符串单元格数组转换为N-by- 4个值矩阵。从此示例数据开始(一个字符串中包含4组条目):

str = 'u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.01, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.02, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.03, K=3.1416, gamma=0.1, A=-0.1';

代码仅2行:

vals = regexp(str, 'u1 @ t=([-\.\dE]+), K=([-\.\dE]+), gamma=([-\.\dE]+), A=([-\.\dE]+)', 'tokens');
vals = cellfun(@str2double, vertcat(vals{:}));

结果:

vals =

                   0   3.141600000000000   0.100000000000000  -0.100000000000000
   0.010000000000000   3.141600000000000   0.100000000000000  -0.100000000000000
   0.020000000000000   3.141600000000000   0.100000000000000  -0.100000000000000
   0.030000000000000   3.141600000000000   0.100000000000000  -0.100000000000000

每个列均包含tKgammaA的值。

答案 5 :(得分:0)

另一种基于正则表达式的Matlab解决方案。用s

表示字符串(字符向量)
v = reshape(str2double(regexp([s ' '], '(?<=(t|K|gamma|A)=).+?(?=,| )', 'match')), 4, []).';
在您的示例中,

给出

v =
                   0   3.141600000000000   0.100000000000000  -0.100000000000000
   0.010000000000000   3.141600000000000   0.100000000000000  -0.100000000000000
   0.020000000000000   3.141600000000000   0.100000000000000  -0.100000000000000
   0.030000000000000   3.141600000000000   0.100000000000000  -0.100000000000000

正则表达式匹配以t=K=等开头,后跟逗号或空格的任何内容。在字符串的末尾添加一个空格,以便可以找到最后一个匹配项。 str2double将匹配的子字符串转换为数字(如果可能)。然后使用一些转置和重塑将结果排列为矩阵,其中四个变量中的每一个都是一列。

答案 6 :(得分:0)

这是在MATLAB中执行此操作的基本方法:

text ='u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.01, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.02, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.03, K=3.1416, gamma=0.1, A=-0.1';
a=size(text);
pos1=strfind(text, 'u1'); % first position of variable
pos2=strfind(text, 'A='); % position of 'A='
pos2=pos2+5; % since after 'A=' there are five spaces until the end of the desired variable
vars=length(char); % number of new variables within 'text'
for i=1:length(pos2)
     output{i}=text(pos1(i):pos2(i)); % output variable as cell with entries as new variables
end 

它找到所需变量的第一个和最后一个字符的位置,并将其从“文本”中删除。我建议采用类似的方法来提取单个变量名称和值的文本。我会在“ =”符号上使用“ strfind”功能。

答案 7 :(得分:0)

我认为,无需MATLAB中的regexp,可以更轻松地完成此操作。使用字符串代替char也有帮助。

result = extractAfter(text,'u1 @ ');
result = split(result, 'u1 @ ');
result = split(result, ',');
result = extractAfter(result,'=');
result = double(result);

这可能是迄今为止最快的解决方案。

>> testFunc
Elapsed time is 0.075453 seconds. % My solution
Elapsed time is 2.820094 seconds. % Luis Mendo solution

function testFunc()

    text = "u1 @ t=0, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.01, K=3.1416, " + ...
           "gamma=0.1, A=-0.1 u1 @ t=0.02, K=3.1416, gamma=0.1, A=-0.1 u1 @ t=0.03, " + ...
           "K=3.1416, gamma=0.1, A=-0.1";

    % My solution
    tic
    for i = 1:1e4
        result = extractAfter(text,'u1 @ ');
        result = split(result, 'u1 @ ');
        result = split(result, ',');
        result = extractAfter(result,'=');
        result = double(result);
    end
    toc

    % Luis Mendo solution
    tic;
    for i = 1:1e4
        result = reshape(str2double(regexp(text + ' ', '(?<=(t|K|gamma|A)=).+?(?=,| )', 'match')), 4, [])';
    end
    toc