我有一些数据,我从一个格式如下所示的文件中获取字符串。我想要做的是创建一个向量(存储为python中的列表),指示[x2,y2,z2]和[x1,x2,x3]之间的x,y,z方向的差异。字符串如下所示。
一旦我将所需的[x2,y2,z2]和[x1,x2,x3]提取为整数列表,我应该可以很好地计算差异向量。我需要帮助的是从下面的数据创建这些[x2,y2,z2]和[x1,x2,x3]列表。
data = """x1=45 y1=74 z1=55 col1=[255, 255, 255] x2=46 y2=74 z2=55 col2=[255, 255, 255]
x1=34 y1=12 z1=15 col1=[255, 255, 255] x2=35 y2=12 z2=15 col2=[255, 255, 255]
x1=22 y1=33 z1=24 col1=[255, 255, 255] x2=23 y2=33 z2=24 col2=[255, 255, 255]
x1=16 y1=45 z1=58 col1=[255, 255, 255] x2=17 y2=45 z2=58 col2=[255, 255, 255]
x1=27 y1=66 z1=21 col1=[255, 255, 255] x2=28 y2=66 z2=21 col2=[255, 255, 255]
"""
为了澄清,我只需要弄清楚如何为单行提取[x2,y2,z2]和[x1,x2,x3]列表。我可以弄清楚如何为每一行循环并计算每行的差异向量。它只是从每一行中提取相关数据,并将其重新格式化为一种可疑的格式,这种格式让我感到难过。
我怀疑使用正则表达式是提取此信息的潜在途径。我查看了https://docs.python.org/2/library/re.html处的文档,并对该文档感到困惑和困惑。我只想要一个易于理解的方法来做到这一点。
答案 0 :(得分:3)
对于单行,假设所有行具有相同的格式,您可以执行以下操作:
import re
a_line = "x1=45 y1=74 z1=55 col1=[255, 255, 255] x2=46 y2=74 z2=55 col2=[255, 255, 255]"
x1,y1,z1,x2,y2,z2 = list(map(int, re.findall(r'=(\d+)', a_line)))
要处理数据中的多行:
for a_line in data.split("\n"):
if a_line:
x1,y1,z1,x2,y2,z2 = list(map(int, re.findall(r'=(\d+)', a_line)))
print(x1,y1,z1,x2,y2,z2)
给出:
45 74 55 46 74 55
34 12 15 35 12 15
22 33 24 23 33 24
16 45 58 17 45 58
27 66 21 28 66 21
答案 1 :(得分:2)
我确切地知道你来自哪里。直到昨天我还没理解正则表达式,他们总是混淆我的地狱。但是一旦你了解它们,你就会发现它们有多么强大。这是您的问题的一种可能的解决方案。我还会对正则表达式的作用有一点直觉,因此它有望减少正则表达式背后的混淆。
在下面的代码中,我假设您一次只处理一行,并且数据的格式始终相同。
# Example of just one line of the data
line = """x1=45 y1=74 z1=55 col1=[255, 255, 255] x2=46 y2=74 z2=55 col2=[255, 255, 255] """
# Extract the relevant x1, y1, z1 values, stored as a list of strings
p1 = re.findall(r"[x-z][1]=([\d]*)", line)
# Extract the relevant x2, y2, z2 values, stored as a list of strings
p2 = re.findall(r"[x-z][2]=([\d]*)", line)
# Convert the elements in each list from strings to integers
p1 = [int(x) for x in p1]
p2 = [int(x) for x in p2]
# Calculate difference vector (Im assuming this is what you're trying to do)
diff = [p2[i] - p1[i] for i in range(len(p2))]
简要解释正则表达式中的符号正在做什么
# EXPLANATION OF THE REGEX.
# Finds segments of strings that:
# [x-z] start with a letter x,y, or z
# [1] followed by the number 1
# = followed by the equals sign
#
# But dont return any of that section of the string, only use that
# information to then extract the following values that we do actually want
#
# ( Return the parts of the string that have the following pattern,
# given that they were preceded by the previous pattern
#
# [\d] contain only a numeric digit
# * keep proceeding forward if the current character is a digit
# ) end of the pattern, now we can return the substring.