Question

昨天我问了一个类似的问题，但是已经删除了它，因为我现在意识到它的格式不正确（我是Python新手）。因此，任何人都很难帮助我。我很抱歉;我知道这不是好形式。我希望我在这里做得更好。

背景：我有一个模拟的几个输出文件。我想从文件中导入和绘制数据。大多数文件的编号在列中排列。使用“ loadtxt”导入数字很容易。它们以小数的浮点数数组形式到达（据我所知），然后我可以绘制它们。

问题：我已经为one的文件苦苦挣扎了三天，因为它们没有按好的列排列。它由文本和数字组成，在绘制之前，我必须首先提取所需的数字（单击上一个单词“ one”以查看文件的一小段-实际的行长为数千行）。我将其称为“困难”文件。我可以提取并导入数字，但是它们以元组的形式到达，而且我无法将它们转换为浮点数数组，因此无法根据我从其他文件导入的数据来绘制它们。

即使过去几天尝试过，我也不真正了解元组是什么，所以我可能在某个地方犯了一个愚蠢的错误。在下面的示例中，我尝试使用一种将元组转换为浮点数组的方法。任何建议将不胜感激。请随意提出任何批评，让我更清楚地说明这一点。

我的代码：

from scipy import *
import numpy as np
import matplotlib
matplotlib.use(matplotlib.get_backend())
import matplotlib.pyplot as plt
import re

while True:
    try:
        cellfile1="pathToDifficultFile" #I have to use "regex" to extract numbers from a file that contains numbers and text. They arrive as some kind of tuple.
        infile1=open(cellfile1,'r')
        cellfile2="pathToEasyFile" #I can use "loadtxt" to get the data. The data arrive as nice arrays of floats--for example, times: 1, 2, 3, 4,... seconds.
        infile2=open(cellfile2,'r')
        break
    except IOError as e:
        print("Cannot find file..try again.")

skip        = int(input('How many steps to skip?')) # Skip the first few time steps (first rows in my output files) because the result often not correct in my simulations.
cell        = loadtxt(cellfile2,skiprows=2+skip)
step        = np.array(cell[:,0]) # This is what I want to be the x axis data in my plot; it's just time, like 1, 2, 3, 4 seconds.

# Extract numbers I need from the difficult file
for line in infile1: #   Iterate over the lines
    match = re.search('Total=     (\d.+)', line) # Returns weird tuple.
    if match: # Did we find a match?
        totalMoment0 = match.group(1) # Yes, process it

totalMoment = np.asarray(totalMoment0) #Here I'm trying to convert the weird imported tuple data from regex directly above to an array of floats so I can plot it versus the time data imported from the other file.
avgtotalMoment =np.cumsum(totalMoment)/(step-skip)
plt.plot(step,totalMoment,'-')
plt.plot(step,avgtotalMoment,'-')
plt.xlabel('Timestep')
plt.ylabel('Imported difficult data')
plt.show()

我的代码的输出：

How many steps to skip?0
[[  1.00000000e+00   5.00000000e-01   7.82390980e-01 ...,  -9.94476371e+02
   -9.93104616e+02   2.86557169e+01]
 [  2.00000000e+00   1.00000000e+00   7.70928719e-01 ...,  -9.94464419e+02
   -9.93104149e+02   5.06833816e+00]
 [  3.00000000e+00   1.50000000e+00   7.50579191e-01 ...,  -9.94443439e+02
   -9.93103532e+02   5.15203691e+00]
 ..., 
 [  2.13340000e+04   1.06670000e+04   7.57428741e-01 ...,  -9.94623426e+02
   -9.93037136e+02   1.91433048e+01]
 [  2.13350000e+04   1.06675000e+04   7.28059027e-01 ...,  -9.94593384e+02
   -9.93036461e+02   3.76293707e+00]
 [  2.13360000e+04   1.06680000e+04   7.08130301e-01 ...,  -9.94572844e+02
   -9.93035855e+02   4.03132892e+00]]
Traceback (most recent call last):
  File "momentsFromQsMomentsFile.py", line 42, in <module>
    plt.plot(step,totalMoment,'-')
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/pyplot.py", line 2987, in plot
    ret = ax.plot(*args, **kwargs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/axes.py", line 4137, in plot
    for line in self._get_lines(*args, **kwargs):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/axes.py", line 317, in _grab_next_args
    for seg in self._plot_args(remaining, kwargs):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/axes.py", line 295, in _plot_args
    x, y = self._xy_from_xy(x, y)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/axes.py", line 237, in _xy_from_xy
    raise ValueError("x and y must have same first dimension")
ValueError: x and y must have same first dimension

Answer 1

我认为这里的问题可能是您的for循环。

for line in infile1: #   Iterate over the lines
    match = re.search('Total=     (\d.+)', line) # Returns a match object
    if match: # Did we find a match?
        totalMoment0 = match.group(1) # this will be a string, assuming the group has a match.

请注意，每次找到匹配项时，您将如何分配给totalMoment0？因此，您每次都会得到一个字符串，然后将其覆盖。我认为，与此相关的另一个问题是python中的字符串是可迭代的！因此，您的最后一个匹配项，例如"1000"是一个字符串，numpy的asarray将很高兴地转换为数组，就像array('1','0', '0', '0')！

您应该做的是像这样附加值：

output_matches = [] # set up an empty list
for line in infile1: #   Iterate over the lines
    match = re.search('Total=     (\d.+)', line) # Try and get a match
    if match: # Did we find a match?
        output_matches.append(float(match.group(1))) # append the match to the list, casting the match as a float as you do so.

请注意，如果此处的正则表达式不正确，则尝试将其强制转换为浮点数时可能会出错。但是我会把这个问题留给将来的你！

Answer 2

这是访问元组值并将字符串转换为浮点数的方法：

>>> m = re.search(r'Total=\s+([0-9\-\.]+)', " Random Stuff 12348    Total=     -23.94409825335")
>>> m.groups()
('-23.94409825335',)
>>> result = float(m.groups()[0])
>>> result
-23.94409825335

如何将元组转换为浮点数？

2 个答案: