Python:从文件中提取备用列

时间:2018-09-23 00:32:11

标签: python

我想从文件中提取所有双打/浮点数。任何行都看起来像:

0    324.609    1    -39475.435    2     23.439    3    983.098
4    -4384.698    5    9475.405    6     2398.349    7    9800.138
...

现在,我正在建立列外的列表:

    y1 = [ line.split()[1] for line in data]
    y2 = [ line.split()[3] for line in data]
    y3 = [ line.split()[5] for line in data]
    y4 = [ line.split()[7] for line in data]

但是,如果没有第7列,则索引超出范围。如何防止这种情况发生?此外,是否有更好的方法从文件中提取所有双精度(带有-符号)?

谢谢。

3 个答案:

答案 0 :(得分:2)

通过使用Pandas,您可以避免解析格式错误的数据文件的痛苦。在下面的示例中,我假设文件的第二行没有最后两列:

import pandas as pd
data = pd.read_table("yourfile.dat", sep='\s+', header=None, index_col=None)
#   0         1  2          3  4         5    6        7
#0  0   324.609  1 -39475.435  2    23.439  3.0  983.098
#1  4 -4384.698  5   9475.405  6  2398.349  NaN      NaN

y1 = data[1].dropna().tolist()
y2 = data[3].dropna().tolist()
y3 = data[5].dropna().tolist()
y4 = data[7].dropna().tolist()
y4
#[983.0980000000001]

答案 1 :(得分:0)

要保存备用列,请生成一个奇数列表。

    L = list(range(10)) 
    y1 = []
    for lines in data:
        line = lines.split()
        n = len(line)
        l = L[1:n:2]
        for i in l:
            y1.append(line[i])
    print y1

y1是奇数列中所有数字的列表。

答案 2 :(得分:0)

遍历每行时可以使用try / except块。

No match found

java.lang.IllegalStateException: No match found
at java.util.regex.Matcher.group(Matcher.java:536)
at java.util.regex.Matcher.group(Matcher.java:496)
at org.powermock.modules.junit4.internal.impl.NotificationBuilder.determineTestMethod(NotificationBuilder.java:141)
at org.powermock.modules.junit4.internal.impl.NotificationBuilder.access$000(NotificationBuilder.java:37)
at org.powermock.modules.junit4.internal.impl.NotificationBuilder$OngoingTestRun.<init>(NotificationBuilder.java:85)
at org.powermock.modules.junit4.internal.impl.NotificationBuilder.testStartHasBeenFired(NotificationBuilder.java:231)
at org.powermock.modules.junit4.internal.impl.PowerMockRunNotifier.fireTestStarted(PowerMockRunNotifier.java:109)
at junitparams.internal.ParameterisedTestMethodRunner.runTestMethod(ParameterisedTestMethodRunner.java:41)
at junitparams.internal.ParameterisedTestClassRunner.runParameterisedTest(ParameterisedTestClassRunner.java:143)
at junitparams.JUnitParamsRunner.runChild(JUnitParamsRunner.java:388)
at junitparams.JUnitParamsRunner.runChild(JUnitParamsRunner.java:366)
at org.powermock.modules.junit4.internal.impl.DelegatingPowerMockRunner$2.call(DelegatingPowerMockRunner.java:143)
at org.powermock.modules.junit4.internal.impl.DelegatingPowerMockRunner$2.call(DelegatingPowerMockRunner.java:136)
at org.powermock.modules.junit4.internal.impl.DelegatingPowerMockRunner.withContextClassLoader(DelegatingPowerMockRunner.java:127)
at org.powermock.modules.junit4.internal.impl.DelegatingPowerMockRunner.run(DelegatingPowerMockRunner.java:136)
at org.powermock.modules.junit4.common.internal.impl.JUnit4TestSuiteChunkerImpl.run(JUnit4TestSuiteChunkerImpl.java:106)
at org.powermock.modules.junit4.common.internal.impl.AbstractCommonPowerMockRunner.run(AbstractCommonPowerMockRunner.java:53)
at org.powermock.modules.junit4.PowerMockRunner.run(PowerMockRunner.java:59)

如果没有第七列,则不会给您错误。


如果要保持每个数字的顺序(例如,如果希望第7行中的每个元素都成为列表的第7个元素),则可以将np.nan附加到列表中:

y7 = []
for line in data:
    try:
        y7.append(float(line.split()[7]))
    except:
        pass