使用整数作为注释char作为整数与read_csv

时间:2016-02-10 15:45:44

标签: python csv pandas io

我的数据看起来像:

306,1970,
306,1970,
306,1970,
306,1970,
306,1970,
306,1970,
66,1970,1,100,
306,1970,
306,1970,
306,1970,
306,1970,
306,1970,
306,1970,
66,1970,1,100,

不,我想排除/跳过阅读行忽略所有以66开头的数据。

如何组装我的pd.read_csv?

data = """
306,1970,
306,1970,
306,1970,
306,1970,
306,1970,
306,1970,
66,1970,1,100,
306,1970,
306,1970,
306,1970,
306,1970,
306,1970,
306,1970,
66,1970,1,100,
"""

import pandas as pd
from io import StringIO

pd.read_csv(StringIO(data), header=None, dtype={1 : str },  comment='6',)

但是这会在30之后跳过所有内容:

     0
0   30
1   30
2   30
3   30
4   30
5   30
6   30
7   30
8   30
9   30
10  30
11  30

1 个答案:

答案 0 :(得分:1)

首先加载数据框(并声明列)

df = pd.read_csv(StringIO(data), names=[0,1,2,3] ,header=None,dtype={0 : str })

第二步使用正则表达式从66开始删除字符串

df = df[~df[0].str.contains('^66')]
df