我是Python的新手(使用3.5和Anaconda) - 以前在MATLAB方面的经验。非常感谢您的帮助。如果有更简单的方法,请告诉我。
我从一些实验设备的pdf文件中读取并清理了数据并将其附加到一个列表中:
>print(outputdata)
[[['2.37701'], ['-'], ['-'], ['-'], ['-'], ['18.95276'], ['5.07365e-1']], [['2.75613'], ['-'], ['-'], ['-'], ['-'], ['16.99642'], ['4.10023e-1']], [['1.80527'], ['-'], ['-'], ['-'], ['-'], ['20.75384'], ['4.58238e-1']], [['1.58721'], ['-'], ['-'], ['-'], ['-'], ['18.06942'], ['3.81128e-1']], [['1.98336'], ['-'], ['-'], ['-'], ['-'], ['18.20776'], ['3.64733e-1']], [['1.75710'], ['-'], ['-'], ['-'], ['-'], ['23.03760'], ['4.36234e-1']], [['1.58967'], ['-'], ['-'], ['-'], ['-'], ['21.43884'], ['3.88509e-1']], [['2.37701'], ['-'], ['-'], ['-'], ['-'], ['18.95276'], ['5.07365e-1']]]
我正在尝试从列表的每个元素中获取每个元素,并将其保存到新列表中。我还想清理数据以删除括号和引号并保留数字。我需要对此进行操作,所以我计划转换为numpy数组,然后将其添加到DataFrame以更轻松地导出到Excel(我已经有了导出代码)。每个列向量对应于特定标题:
Molecule = ["H2", "Ar", "Methane", "Ethane", "Ethylene", "Propane(C3H8)", "Propylene"]
以下是所需H2数据的示例:
2.37701
2.75613
1.80527
1.58721
1.98336
1.75710
1.58967
2.37701
我首先做到了这一点:
outputdatalist = [x[0] for x in outputdata]
具有以下输出:
[['2.37701'], ['2.75613'], ['1.80527'], ['1.58721'], ['1.98336'], ['1.75710'], ['1.58967'], ['2.37701']]
然后
for row in outputdatalist:
print(' '.join(row)) # I need to append this on every iteration
我的untythonic(并且不成功)这样做的方法是按如下方式执行double(triple?)for循环:
outputdatalist = []
for counter, elem in enumerate(Molecule):
for counter1, elem1 in enumerate(outputdata):
outputdatalist[counter] = [x[counter1] for x in outputdata]
然后将每个outputdatalist [i]转换为np数组,然后使用类似
的循环遍历pd.Dataframepd.DataFrame({Molecule[i]: outputdatalist[i]})
答案 0 :(得分:2)
您可以使用nested list comprehension
,这似乎更快作为apply
的解决方案:
df = pd.DataFrame([[y[0] for y in x] for x in outputdata], columns=Molecule)
print (df)
H2 Ar Methane Ethane Ethylene Propane(C3H8) Propylene
0 2.37701 - - - - 18.95276 5.07365e-1
1 2.75613 - - - - 16.99642 4.10023e-1
2 1.80527 - - - - 20.75384 4.58238e-1
3 1.58721 - - - - 18.06942 3.81128e-1
4 1.98336 - - - - 18.20776 3.64733e-1
5 1.75710 - - - - 23.03760 4.36234e-1
6 1.58967 - - - - 21.43884 3.88509e-1
7 2.37701 - - - - 18.95276 5.07365e-1
计时 :(小型数据框)
In [21]: %timeit pd.DataFrame([[y[0] for y in x] for x in outputdata], columns=Molecule)
1000 loops, best of 3: 1.04 ms per loop
In [22]: %timeit (pd.DataFrame(outputdata, columns=Molecule).apply(lambda x: x.str[0]))
100 loops, best of 3: 4.59 ms per loop