熊猫读取文本文件并转换为专栏CSV数据表

时间:2019-06-12 04:54:39

标签: python pandas dataframe

我有一个文件test_dns,其中有一些类似下面的数据,但是下面的数据将很小,只有两个示例。

---------- dns01-sh01 ---------
zone "celina.com." IN {
zone "global.celina.com." {
zone "storage.celina.com." {
zone "gusain.com" {
zone "." IN {
zone "10.in-addr.arpa." IN {
zone "99.139.in-addr.arpa." IN {
zone "190.158.in-addr.arpa." IN {
zone "172.in-addr.arpa." IN {
zone "localdomain." IN {
zone "localhost." IN {
zone "0.0.127.in-addr.arpa." IN {
zone "255.in-addr.arpa." IN {
zone "0.in-addr.arpa." IN {

---------- dns02-sh02 ---------
zone "celina.com." IN {
zone "global.celina.com." {
zone "storage.celina.com." {
zone "gusain.com" {
zone "." IN {
zone "10.in-addr.arpa." IN {
zone "99.139.in-addr.arpa." IN {
zone "190.158.in-addr.arpa." IN {
zone "172.in-addr.arpa." IN {
zone "localdomain." IN {
zone "localhost." IN {
zone "0.0.127.in-addr.arpa." IN {
zone "255.in-addr.arpa." IN {
zone "0.in-addr.arpa." IN {

我正在将数据放入pandas数据框中,并根据行"---"

我在下面尝试过,但是作为一个新手学习者,正在寻求传播想法的机会。

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.read_fwf("test_dns")
>>> df
      ---------- dns01-sh01 ---------
0             zone "celina.com." IN {
1         zone "global.celina.com." {
2        zone "storage.celina.com." {
3                 zone "gusain.com" {
4                       zone "." IN {
5        zone "10.in-addr.arpa." IN {
6    zone "99.139.in-addr.arpa." IN {
7   zone "190.158.in-addr.arpa." IN {
8       zone "172.in-addr.arpa." IN {
9            zone "localdomain." IN {
10             zone "localhost." IN {
11  zone "0.0.127.in-addr.arpa." IN {
12      zone "255.in-addr.arpa." IN {
13        zone "0.in-addr.arpa." IN {
14    ---------- dns02-sh02 ---------
15            zone "celina.com." IN {
16        zone "global.celina.com." {
17       zone "storage.celina.com." {
18                zone "gusain.com" {
19                      zone "." IN {
20       zone "10.in-addr.arpa." IN {
21   zone "99.139.in-addr.arpa." IN {
22  zone "190.158.in-addr.arpa." IN {
23      zone "172.in-addr.arpa." IN {
24           zone "localdomain." IN {
25             zone "localhost." IN {
26  zone "0.0.127.in-addr.arpa." IN {
27      zone "255.in-addr.arpa." IN {
28        zone "0.in-addr.arpa." IN {

所需的输出:

---------- dns01-sh01 ---------     ---------- dns02-sh02 ---------
zone "celina.com." IN {             zone "celina.com." IN {
zone "global.celina.com." {         zone "global.celina.com." {
zone "storage.celina.com." {        zone "storage.celina.com." {
zone "gusain.com" {                 zone "gusain.com" {
zone "." IN {                       zone "." IN {
zone "10.in-addr.arpa." IN {        zone "10.in-addr.arpa." IN {
zone "99.139.in-addr.arpa." IN {    zone "99.139.in-addr.arpa." IN {
zone "190.158.in-addr.arpa." IN {   zone "190.158.in-addr.arpa." IN {
zone "172.in-addr.arpa." IN {       zone "172.in-addr.arpa." IN {
zone "localdomain." IN {            zone "localdomain." IN {
zone "localhost." IN {              zone "localhost." IN {
zone "0.0.127.in-addr.arpa." IN {   zone "0.0.127.in-addr.arpa." IN {
zone "255.in-addr.arpa." IN {       zone "255.in-addr.arpa." IN {
zone "0.in-addr.arpa." IN {         zone "0.in-addr.arpa." IN {

从@Sandeep运行解决方案时出错。

>>> for i in opened_file.read().split('\n\n'):
...     dfs.append(pd.read_fwf(StringIO(i)))
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 737, in read_fwf
    return _read(filepath_or_buffer, kwds)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 445, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 814, in __init__
    self._make_engine(self.engine)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 1055, in _make_engine
    self._engine = klass(self.f, **self.options)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 3403, in __init__
    PythonParser.__init__(self, f, **kwds)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 2070, in __init__
    self._make_reader(f)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 3407, in _make_reader
    self.comment, self.skiprows)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 3307, in __init__
    self.colspecs = self.detect_colspecs(skiprows=skiprows)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 3366, in detect_colspecs
    raise EmptyDataError("No rows from which to infer column width")
pandas.errors.EmptyDataError: No rows from which to infer column width

3 个答案:

答案 0 :(得分:1)

也许这不是Python方式,但是您可以尝试以下简单方法:

import pandas as pd

txt_file = open("test_dns",'r')
text = txt_file.read().split('\n')
cols = []
cols1 = []
cols2 = []
for txt in text:
    if "-----" in txt:
        cols.append(txt)
    elif txt == "":
        pass
    else:
        if len(cols) == 1:
            cols1.append(txt)
        else:
            cols2.append(txt)
data = ({cols[0]:cols1, cols[1]:cols2})
df = pd.DataFrame(data)
print (df)

输出:

      ---------- dns01-sh01 ---------    ---------- dns02-sh02 ---------
0             zone "celina.com." IN {            zone "celina.com." IN {
1         zone "global.celina.com." {        zone "global.celina.com." {
2        zone "storage.celina.com." {       zone "storage.celina.com." {
3                 zone "gusain.com" {                zone "gusain.com" {
4                       zone "." IN {                      zone "." IN {
5        zone "10.in-addr.arpa." IN {       zone "10.in-addr.arpa." IN {
6    zone "99.139.in-addr.arpa." IN {   zone "99.139.in-addr.arpa." IN {
7   zone "190.158.in-addr.arpa." IN {  zone "190.158.in-addr.arpa." IN {
8       zone "172.in-addr.arpa." IN {      zone "172.in-addr.arpa." IN {
9            zone "localdomain." IN {           zone "localdomain." IN {
10             zone "localhost." IN {             zone "localhost." IN {
11  zone "0.0.127.in-addr.arpa." IN {  zone "0.0.127.in-addr.arpa." IN {
12      zone "255.in-addr.arpa." IN {      zone "255.in-addr.arpa." IN {
13        zone "0.in-addr.arpa." IN {        zone "0.in-addr.arpa." IN {

答案 1 :(得分:1)

open上将split\n\n一起使用,然后用于循环和pandas.concat

from pandas.compat import StringIO
opened_file = open("test_dns.txt",'r')

dfs = []
for i in opened_file.read().split('\n\n'):
    dfs.append(pd.read_fwf(StringIO(i)))

# Or alternative to for loop 
dfs = [pd.read_fwf(StringIO(i)) for i in opened_file.read().split('\n\n')]

df = pd.concat(dfs, axis=1)

print(df)
      ---------- dns01-sh01 ---------    ---------- dns02-sh02 ---------
0             zone "celina.com." IN {            zone "celina.com." IN {
1         zone "global.celina.com." {        zone "global.celina.com." {
2        zone "storage.celina.com." {       zone "storage.celina.com." {
3                 zone "gusain.com" {                zone "gusain.com" {
4                       zone "." IN {                      zone "." IN {
5        zone "10.in-addr.arpa." IN {       zone "10.in-addr.arpa." IN {
6    zone "99.139.in-addr.arpa." IN {   zone "99.139.in-addr.arpa." IN {
7   zone "190.158.in-addr.arpa." IN {  zone "190.158.in-addr.arpa." IN {
8       zone "172.in-addr.arpa." IN {      zone "172.in-addr.arpa." IN {
9            zone "localdomain." IN {           zone "localdomain." IN {
10             zone "localhost." IN {             zone "localhost." IN {
11  zone "0.0.127.in-addr.arpa." IN {  zone "0.0.127.in-addr.arpa." IN {
12      zone "255.in-addr.arpa." IN {      zone "255.in-addr.arpa." IN {
13        zone "0.in-addr.arpa." IN {        zone "0.in-addr.arpa." IN {

答案 2 :(得分:1)

尝试一下:

df2 = df.copy()
df = pd.DataFrame()
df[df2.columns[0]] = df2.iloc[:, 0][:df2.iloc[:, 0].str[0].ne('-').idxmin()]
df[df2.iloc[len(df)].item()] = df2.drop(df.index.tolist()).iloc[1:].reset_index(drop=True)
print(df)

输出:

      ---------- dns01-sh01 ---------    ---------- dns02-sh02 ---------
0             zone "celina.com." IN {    ---------- dns02-sh02 ---------
1         zone "global.celina.com." {            zone "celina.com." IN {
2        zone "storage.celina.com." {        zone "global.celina.com." {
3                 zone "gusain.com" {       zone "storage.celina.com." {
4                       zone "." IN {                zone "gusain.com" {
5        zone "10.in-addr.arpa." IN {                      zone "." IN {
6    zone "99.139.in-addr.arpa." IN {       zone "10.in-addr.arpa." IN {
7   zone "190.158.in-addr.arpa." IN {   zone "99.139.in-addr.arpa." IN {
8       zone "172.in-addr.arpa." IN {  zone "190.158.in-addr.arpa." IN {
9            zone "localdomain." IN {      zone "172.in-addr.arpa." IN {
10             zone "localhost." IN {           zone "localdomain." IN {
11  zone "0.0.127.in-addr.arpa." IN {             zone "localhost." IN {
12      zone "255.in-addr.arpa." IN {  zone "0.0.127.in-addr.arpa." IN {
13        zone "0.in-addr.arpa." IN {      zone "255.in-addr.arpa." IN {