Question

我有一个以下格式的.txt文件：

在熊猫中，有没有一种方法可以将其直接读取为n列的数据帧？即

   0  1  2  3  4  5  6  7
0  1  0  1  0  1  0  1  1
1  0  0  1  0  1  0  1  0
2  1  1  0  0  1  1  0  0
3  0  0  1  0  1  1  0  1

我知道这可以通过将文件作为单列数据帧加载然后拆分（例如）来实现。

df = df[0].apply(lambda x: pd.Series(list(x)))

但是由于数据帧很大，这需要很多时间。

熊猫在加载数据框时是否有一种方法可以将每个字符视为单独的列？

Answer 1

一种可能的解决方案是使用read_fwf，并使用参数widths指定列数：

import pandas as pd

temp=u"""10101011
00101010
11001100
00101101"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_fwf(pd.compat.StringIO(temp), header=None, widths= [1] * 8)

print (df)
   0  1  2  3  4  5  6  7
0  1  0  1  0  1  0  1  1
1  0  0  1  0  1  0  1  0
2  1  1  0  0  1  1  0  0
3  0  0  1  0  1  1  0  1

Answer 2

您可以使用简单的列表推导来实现

import pandas as pd

text = """10101011
00101010
11001100
00101101"""

df = pd.DataFrame(list(line) for line in text.split('\n'))

print(df)

   0  1  2  3  4  5  6  7
0  1  0  1  0  1  0  1  1
1  0  0  1  0  1  0  1  0
2  1  1  0  0  1  1  0  0
3  0  0  1  0  1  1  0  1

Answer 3

使用 from_records：

import pandas as pd

df = pd.DataFrame.from_records(temp.split())

>>> df
   0  1  2  3  4  5  6  7
0  1  0  1  0  1  0  1  1
1  0  0  1  0  1  0  1  0
2  1  1  0  0  1  1  0  0
3  0  0  1  0  1  1  0  1

用熊猫直接读取长度为n的字符串作为n个单字符列

3 个答案: