源数据:
20 7369 CLERK
30 7499 SALESMAN
30 7521 SALESMAN
20 7566 MANAGER
30 7654 SALESMAN
30 7698 MANAGER
10 7782 MANAGER
20 7788 ANALYST
10 7839 PRESIDENT
30 7844 SALESMAN
20 7876 CLERK
30 7900 CLERK
20 7902 ANALYST
要求:
012345678901234567890123456789
大家好,
我正在将该.dat文件数据成功读取到python大熊猫中。 数据从左到右的长度为30(012345678901234567890123456789) 我的要求是 我需要推导3列
From left to right: 1 to 4 (length 4) spaces as DEPTNO
From left to right: 5 to 13 (length 9) spaces as EMPNO
From left to right: 14 to 30 (length 9) spaces as EMPNO
我尝试了以下代码:
import pandas as pd
with open('Emp.dat','r') as f:
next(f) # skip first row
df = pd.DataFrame(l.rstrip().split() for l in f)
必需的输出:
DEPTNO EMPNO JOB
20 7369 CLERK
30 7499 SALESMAN
30 7521 SALESMAN
20 7566 MANAGER
30 7654 SALESMAN
30 7698 MANAGER
10 7782 MANAGER
20 7788 ANALYST
10 7839 PRESIDENT
30 7844 SALESMAN
20 7876 CLERK
30 7900 CLERK
20 7902 ANALYST
答案 0 :(得分:0)
也许使用columns
参数:
import pandas as pd
with open('Emp.dat','r') as f:
next(f) # skip first row
df = pd.DataFrame((l.rstrip().split() for l in f), columns=['DEPTNO', 'EMPNO', 'JOB'])
输出:
DEPTNO EMPNO JOB
0 20 7369 CLERK
1 30 7499 SALESMAN
2 30 7521 SALESMAN
3 20 7566 MANAGER
4 30 7654 SALESMAN
5 30 7698 MANAGER
6 10 7782 MANAGER
7 20 7788 ANALYST
8 10 7839 PRESIDENT
9 30 7844 SALESMAN
10 20 7876 CLERK
11 30 7900 CLERK
12 20 7902 ANALYST
答案 1 :(得分:0)
这里有两种方式。
使用df = pd.read_csv('emp.dat', sep=r'\s+)
将每一行分割为任意数量的空格字符(有关How to make separator in pandas read_csv more flexible wrt whitespace?中的详细信息)
使用固定宽度字段df = pd.read_fwf(io.StringIO(t), width=[4,9,9])
在两种方式中,第一行都将用作标题行。使用pd.read...(..., header=None, skiprows=[0])
完全忽略它