我文件中的行分布在多行上。在我文件的以下块中,第一行从0.0000000000000000E + 00开始,而第二行从1.5625000000000000E-02开始。如何将行数从0.0000000000000000E + 00读取到1.5625000000000000E-02之前的数字?
我正在尝试numpy的genfromtxt()函数和熊猫的read_csv(),但我还无法传达系统我打算做什么。
#I have put quotation marks here just to indicate the start and end of rows. They
#are not part of the file.
"0.0000000000000000E+00
00000000 4.9999998882412910E-03 8.7714487508765548E-03
00000001 5.0000002374872565E-04 5.0877144875087654E-01"
"1.5625000000000000E-02
00000000 4.9999998882412910E-03 8.4622513106357884E-03
00000001 5.0000002374872565E-04 5.0864039953085094E-01"
正确阅读后,我的输入数组将如下所示:
0.0000000000000000E+00 00000000 4.9999998882412910E-03 8.7714487508765548E-03 00000001 5.0000002374872565E-04 5.0877144875087654E-01
1.5625000000000000E-02 00000000 4.9999998882412910E-03 8.4622513106357884E-03 00000001 5.0000002374872565E-04 5.0864039953085094E-01
答案 0 :(得分:0)
这应该与正则表达式软件包一起工作。
text = """
0.0000000000000000E+00
00000000 4.9999998882412910E-03 8.7714487508765548E-03
00000001 5.0000002374872565E-04 5.0877144875087654E-01
1.5625000000000000E-02
00000000 4.9999998882412910E-03 8.4622513106357884E-03
00000001 5.0000002374872565E-04 5.0864039953085094E-01"""
代码:
import re
xx = re.split(pattern="\n\n\n", string=text)
for xy in xx:
xy = re.sub(pattern="\s+", repl=" ", string=xy)
print(xy)
print("*"*55)
输出:
0.0000000000000000E+00 00000000 4.9999998882412910E-03 8.7714487508765548E-03 00000001 5.0000002374872565E-04 5.0877144875087654E-01
*******************************************************
1.5625000000000000E-02 00000000 4.9999998882412910E-03 8.4622513106357884E-03 00000001 5.0000002374872565E-04 5.0864039953085094E-01
*******************************************************
答案 1 :(得分:0)
假设您要在输出数据中包含7
行,这就是您的file。因此,这是将其解析为pandas
数据帧的方式:
import pandas as pd
with open('temp.txt') as f:
d = f.read().split()
data = {'col1': [], 'col2': [], 'col3': [], 'col4': [], 'col5': [], 'col6': [], 'col7': []}
for i in range(0, len(d), 7):
for j in range(7):
data['col{}'.format(j+1)].append(d[j])
df = pd.DataFrame(data)
输出:
答案 2 :(得分:0)
下面的代码应正确解析文件的内容:
import re
import pandas
sample = """0.0000000000000000E+00
00000000 4.9999998882412910E-03 8.7714487508765548E-03
00000001 5.0000002374872565E-04 5.0877144875087654E-01
1.5625000000000000E-02
00000000 4.9999998882412910E-03 8.4622513106357884E-03
00000001 5.0000002374872565E-04 5.0864039953085094E-01
"""
def load_matrix(content):
lines = (line for line in content.splitlines() if len(line.strip()) > 0)
rows = list()
row = list()
for line in lines:
fields = line.split()
is_continuation = re.match(r'^\d{8}$', fields[0])
if is_continuation:
row += [float(value) for value in fields[1:]]
else:
if (len(row) > 0):
rows.append(row)
row = [float(value) for value in fields]
rows.append(row)
return pandas.DataFrame(rows)
print(load_matrix(sample))
显示:
0 1 2 3 4
0 0.000000 0.005 0.008771 0.0005 0.508771
1 0.015625 0.005 0.008462 0.0005 0.508640
答案 3 :(得分:0)
输出为两个列表:
import re
file_object = open("over.txt",'rU')
df1=[]
df2=[]
content = ''
try:
for line in file_object:
content = content + line
finally:
file_object.close()
words = re.split(pattern="\n\n\n", string=content)
num = re.sub(pattern="\s+", repl=",", string=words[0])
for i in num.split(","):
df1.append(float(i))
num = re.sub(pattern="\s+", repl=",", string=words[1])
for i in num.split(","):
df2.append(float(i))
print df1
print df2
输出:
[0.0, 0.0, 0.004999999888241291, 0.008771448750876555, 1.0, 0.0005000000237487257, 0.5087714487508765]
[0.015625, 0.0, 0.004999999888241291, 0.008462251310635788, 1.0, 0.0005000000237487257, 0.5086403995308509]