Question

我有一个这样的字符串：

txt = 'A      AGILENT TECH INC              \nAA     ALCOA INC                     '

想要获得这样的DataFrame：

In [185]: pd.DataFrame({'col1':['A','AA'],'col2':['AGILENT TECH INC','ALCOA INC']})
Out[185]:
  col1              col2
0    A  AGILENT TECH INC
1   AA         ALCOA INC

我到目前为止尝试过：

from StringIO import StringIO
import re

pd.DataFrame.from_csv(StringIO(re.sub(' +\n', ';', txt)), sep=';')

Out[204]:
Empty DataFrame
Columns: [AA     ALCOA INC                     ]
Index: []

但结果不是预期的结果。我似乎无法处理from_csv或StringIO的所有选项。

肯定与此question相关联。

Answer 1

使用read_fwf并传递列宽：

In [15]:
import io
import pandas as pd    
col2
txt = 'A      AGILENT TECH INC              \nAA     ALCOA INC                     '
df = pd.read_fwf(io.StringIO(txt), header=None, widths=[7, 37], names=['col1', 'col2'])
df
Out[15]:
  col1              col2
0    A  AGILENT TECH INC
1   AA         ALCOA INC

Answer 2

import re

txt = 'A      AGILENT TECH INC              \nAA     ALCOA INC                     '

result = {'col{0}'.format(i + 1): re.split(r'\s{2,}', x.strip()) for i, x in enumerate(txt.splitlines())}

#{'col1':['A','AA'],'col2':['AGILENT TECH INC','ALCOA INC']}

Answer 3

imaginary_part = abs * sin(phase)

将字符串转换为dataframe

3 个答案: