将字符串中的xyz坐标值提取到列表中

时间:2014-11-25 06:37:40

标签: python regex

我有一些数据,我从一个格式如下所示的文件中获取字符串。我想要做的是创建一个向量(存储为python中的列表),指示[x2,y2,z2]和[x1,x2,x3]之间的x,y,z方向的差异。字符串如下所示。

一旦我将所需的[x2,y2,z2]和[x1,x2,x3]提取为整数列表,我应该可以很好地计算差异向量。我需要帮助的是从下面的数据创建这些[x2,y2,z2]和[x1,x2,x3]列表。

data = """x1=45 y1=74 z1=55 col1=[255, 255, 255] x2=46 y2=74 z2=55 col2=[255, 255, 255] 
x1=34 y1=12 z1=15 col1=[255, 255, 255] x2=35 y2=12 z2=15 col2=[255, 255, 255] 
x1=22 y1=33 z1=24 col1=[255, 255, 255] x2=23 y2=33 z2=24 col2=[255, 255, 255] 
x1=16 y1=45 z1=58 col1=[255, 255, 255] x2=17 y2=45 z2=58 col2=[255, 255, 255] 
x1=27 y1=66 z1=21 col1=[255, 255, 255] x2=28 y2=66 z2=21 col2=[255, 255, 255]
"""

为了澄清,我只需要弄清楚如何为单行提取[x2,y2,z2]和[x1,x2,x3]列表。我可以弄清楚如何为每一行循环并计算每行的差异向量。它只是从每一行中提取相关数据,并将其重新格式化为一种可疑的格式,这种格式让我感到难过。

我怀疑使用正则表达式是提取此信息的潜在途径。我查看了https://docs.python.org/2/library/re.html处的文档,并对该文档感到困惑和困惑。我只想要一个易于理解的方法来做到这一点。

2 个答案:

答案 0 :(得分:3)

对于单行,假设所有行具有相同的格式,您可以执行以下操作:

import re

a_line = "x1=45 y1=74 z1=55 col1=[255, 255, 255] x2=46 y2=74 z2=55 col2=[255, 255, 255]" 
x1,y1,z1,x2,y2,z2 = list(map(int, re.findall(r'=(\d+)', a_line)))

要处理数据中的多行:

for a_line in data.split("\n"):    
    if a_line:
        x1,y1,z1,x2,y2,z2 = list(map(int, re.findall(r'=(\d+)', a_line)))
        print(x1,y1,z1,x2,y2,z2)

给出:

45 74 55 46 74 55
34 12 15 35 12 15
22 33 24 23 33 24
16 45 58 17 45 58
27 66 21 28 66 21

答案 1 :(得分:2)

我确切地知道你来自哪里。直到昨天我还没理解正则表达式,他们总是混淆我的地狱。但是一旦你了解它们,你就会发现它们有多么强大。这是您的问题的一种可能的解决方案。我还会对正则表达式的作用有一点直觉,因此它有望减少正则表达式背后的混淆。

在下面的代码中,我假设您一次只处理一行,并且数据的格式始终相同。

# Example of just one line of the data
line = """x1=45 y1=74 z1=55 col1=[255, 255, 255] x2=46 y2=74 z2=55 col2=[255, 255, 255] """

# Extract the relevant x1, y1, z1 values, stored as a list of strings
p1 = re.findall(r"[x-z][1]=([\d]*)", line)

# Extract the relevant x2, y2, z2 values, stored as a list of strings
p2 = re.findall(r"[x-z][2]=([\d]*)", line)

# Convert the elements in each list from strings to integers
p1 = [int(x) for x in p1]
p2 = [int(x) for x in p2]

# Calculate difference vector (Im assuming this is what you're trying to do)
diff = [p2[i] - p1[i] for i in range(len(p2))]

简要解释正则表达式中的符号正在做什么

# EXPLANATION OF THE REGEX. 
# Finds segments of strings that: 
#     [x-z]    start with a letter x,y, or z
#     [1]      followed by the number 1
#     =        followed by the equals sign
# 
#     But dont return any of that section of the string, only use that 
#     information to then extract the following values that we do actually want 
#
#     (        Return the parts of the string that have the following pattern, 
#              given that they were preceded by the previous pattern
# 
#     [\d]     contain only a numeric digit
#     *        keep proceeding forward if the current character is a digit
#     )        end of the pattern, now we can return the substring.