我有一个文件test_dns
,其中有一些类似下面的数据,但是下面的数据将很小,只有两个示例。
---------- dns01-sh01 ---------
zone "celina.com." IN {
zone "global.celina.com." {
zone "storage.celina.com." {
zone "gusain.com" {
zone "." IN {
zone "10.in-addr.arpa." IN {
zone "99.139.in-addr.arpa." IN {
zone "190.158.in-addr.arpa." IN {
zone "172.in-addr.arpa." IN {
zone "localdomain." IN {
zone "localhost." IN {
zone "0.0.127.in-addr.arpa." IN {
zone "255.in-addr.arpa." IN {
zone "0.in-addr.arpa." IN {
---------- dns02-sh02 ---------
zone "celina.com." IN {
zone "global.celina.com." {
zone "storage.celina.com." {
zone "gusain.com" {
zone "." IN {
zone "10.in-addr.arpa." IN {
zone "99.139.in-addr.arpa." IN {
zone "190.158.in-addr.arpa." IN {
zone "172.in-addr.arpa." IN {
zone "localdomain." IN {
zone "localhost." IN {
zone "0.0.127.in-addr.arpa." IN {
zone "255.in-addr.arpa." IN {
zone "0.in-addr.arpa." IN {
我正在将数据放入pandas数据框中,并根据行"---"
我在下面尝试过,但是作为一个新手学习者,正在寻求传播想法的机会。
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.read_fwf("test_dns")
>>> df
---------- dns01-sh01 ---------
0 zone "celina.com." IN {
1 zone "global.celina.com." {
2 zone "storage.celina.com." {
3 zone "gusain.com" {
4 zone "." IN {
5 zone "10.in-addr.arpa." IN {
6 zone "99.139.in-addr.arpa." IN {
7 zone "190.158.in-addr.arpa." IN {
8 zone "172.in-addr.arpa." IN {
9 zone "localdomain." IN {
10 zone "localhost." IN {
11 zone "0.0.127.in-addr.arpa." IN {
12 zone "255.in-addr.arpa." IN {
13 zone "0.in-addr.arpa." IN {
14 ---------- dns02-sh02 ---------
15 zone "celina.com." IN {
16 zone "global.celina.com." {
17 zone "storage.celina.com." {
18 zone "gusain.com" {
19 zone "." IN {
20 zone "10.in-addr.arpa." IN {
21 zone "99.139.in-addr.arpa." IN {
22 zone "190.158.in-addr.arpa." IN {
23 zone "172.in-addr.arpa." IN {
24 zone "localdomain." IN {
25 zone "localhost." IN {
26 zone "0.0.127.in-addr.arpa." IN {
27 zone "255.in-addr.arpa." IN {
28 zone "0.in-addr.arpa." IN {
所需的输出:
---------- dns01-sh01 --------- ---------- dns02-sh02 ---------
zone "celina.com." IN { zone "celina.com." IN {
zone "global.celina.com." { zone "global.celina.com." {
zone "storage.celina.com." { zone "storage.celina.com." {
zone "gusain.com" { zone "gusain.com" {
zone "." IN { zone "." IN {
zone "10.in-addr.arpa." IN { zone "10.in-addr.arpa." IN {
zone "99.139.in-addr.arpa." IN { zone "99.139.in-addr.arpa." IN {
zone "190.158.in-addr.arpa." IN { zone "190.158.in-addr.arpa." IN {
zone "172.in-addr.arpa." IN { zone "172.in-addr.arpa." IN {
zone "localdomain." IN { zone "localdomain." IN {
zone "localhost." IN { zone "localhost." IN {
zone "0.0.127.in-addr.arpa." IN { zone "0.0.127.in-addr.arpa." IN {
zone "255.in-addr.arpa." IN { zone "255.in-addr.arpa." IN {
zone "0.in-addr.arpa." IN { zone "0.in-addr.arpa." IN {
从@Sandeep运行解决方案时出错。
>>> for i in opened_file.read().split('\n\n'):
... dfs.append(pd.read_fwf(StringIO(i)))
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 737, in read_fwf
return _read(filepath_or_buffer, kwds)
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 445, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 814, in __init__
self._make_engine(self.engine)
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 1055, in _make_engine
self._engine = klass(self.f, **self.options)
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 3403, in __init__
PythonParser.__init__(self, f, **kwds)
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 2070, in __init__
self._make_reader(f)
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 3407, in _make_reader
self.comment, self.skiprows)
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 3307, in __init__
self.colspecs = self.detect_colspecs(skiprows=skiprows)
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 3366, in detect_colspecs
raise EmptyDataError("No rows from which to infer column width")
pandas.errors.EmptyDataError: No rows from which to infer column width
答案 0 :(得分:1)
也许这不是Python方式,但是您可以尝试以下简单方法:
import pandas as pd
txt_file = open("test_dns",'r')
text = txt_file.read().split('\n')
cols = []
cols1 = []
cols2 = []
for txt in text:
if "-----" in txt:
cols.append(txt)
elif txt == "":
pass
else:
if len(cols) == 1:
cols1.append(txt)
else:
cols2.append(txt)
data = ({cols[0]:cols1, cols[1]:cols2})
df = pd.DataFrame(data)
print (df)
输出:
---------- dns01-sh01 --------- ---------- dns02-sh02 ---------
0 zone "celina.com." IN { zone "celina.com." IN {
1 zone "global.celina.com." { zone "global.celina.com." {
2 zone "storage.celina.com." { zone "storage.celina.com." {
3 zone "gusain.com" { zone "gusain.com" {
4 zone "." IN { zone "." IN {
5 zone "10.in-addr.arpa." IN { zone "10.in-addr.arpa." IN {
6 zone "99.139.in-addr.arpa." IN { zone "99.139.in-addr.arpa." IN {
7 zone "190.158.in-addr.arpa." IN { zone "190.158.in-addr.arpa." IN {
8 zone "172.in-addr.arpa." IN { zone "172.in-addr.arpa." IN {
9 zone "localdomain." IN { zone "localdomain." IN {
10 zone "localhost." IN { zone "localhost." IN {
11 zone "0.0.127.in-addr.arpa." IN { zone "0.0.127.in-addr.arpa." IN {
12 zone "255.in-addr.arpa." IN { zone "255.in-addr.arpa." IN {
13 zone "0.in-addr.arpa." IN { zone "0.in-addr.arpa." IN {
答案 1 :(得分:1)
在open
上将split
与\n\n
一起使用,然后用于循环和pandas.concat
:
from pandas.compat import StringIO
opened_file = open("test_dns.txt",'r')
dfs = []
for i in opened_file.read().split('\n\n'):
dfs.append(pd.read_fwf(StringIO(i)))
# Or alternative to for loop
dfs = [pd.read_fwf(StringIO(i)) for i in opened_file.read().split('\n\n')]
df = pd.concat(dfs, axis=1)
print(df)
---------- dns01-sh01 --------- ---------- dns02-sh02 ---------
0 zone "celina.com." IN { zone "celina.com." IN {
1 zone "global.celina.com." { zone "global.celina.com." {
2 zone "storage.celina.com." { zone "storage.celina.com." {
3 zone "gusain.com" { zone "gusain.com" {
4 zone "." IN { zone "." IN {
5 zone "10.in-addr.arpa." IN { zone "10.in-addr.arpa." IN {
6 zone "99.139.in-addr.arpa." IN { zone "99.139.in-addr.arpa." IN {
7 zone "190.158.in-addr.arpa." IN { zone "190.158.in-addr.arpa." IN {
8 zone "172.in-addr.arpa." IN { zone "172.in-addr.arpa." IN {
9 zone "localdomain." IN { zone "localdomain." IN {
10 zone "localhost." IN { zone "localhost." IN {
11 zone "0.0.127.in-addr.arpa." IN { zone "0.0.127.in-addr.arpa." IN {
12 zone "255.in-addr.arpa." IN { zone "255.in-addr.arpa." IN {
13 zone "0.in-addr.arpa." IN { zone "0.in-addr.arpa." IN {
答案 2 :(得分:1)
尝试一下:
df2 = df.copy()
df = pd.DataFrame()
df[df2.columns[0]] = df2.iloc[:, 0][:df2.iloc[:, 0].str[0].ne('-').idxmin()]
df[df2.iloc[len(df)].item()] = df2.drop(df.index.tolist()).iloc[1:].reset_index(drop=True)
print(df)
输出:
---------- dns01-sh01 --------- ---------- dns02-sh02 ---------
0 zone "celina.com." IN { ---------- dns02-sh02 ---------
1 zone "global.celina.com." { zone "celina.com." IN {
2 zone "storage.celina.com." { zone "global.celina.com." {
3 zone "gusain.com" { zone "storage.celina.com." {
4 zone "." IN { zone "gusain.com" {
5 zone "10.in-addr.arpa." IN { zone "." IN {
6 zone "99.139.in-addr.arpa." IN { zone "10.in-addr.arpa." IN {
7 zone "190.158.in-addr.arpa." IN { zone "99.139.in-addr.arpa." IN {
8 zone "172.in-addr.arpa." IN { zone "190.158.in-addr.arpa." IN {
9 zone "localdomain." IN { zone "172.in-addr.arpa." IN {
10 zone "localhost." IN { zone "localdomain." IN {
11 zone "0.0.127.in-addr.arpa." IN { zone "localhost." IN {
12 zone "255.in-addr.arpa." IN { zone "0.0.127.in-addr.arpa." IN {
13 zone "0.in-addr.arpa." IN { zone "255.in-addr.arpa." IN {