我的数据看起来像:
306,1970,
306,1970,
306,1970,
306,1970,
306,1970,
306,1970,
66,1970,1,100,
306,1970,
306,1970,
306,1970,
306,1970,
306,1970,
306,1970,
66,1970,1,100,
不,我想排除/跳过阅读行忽略所有以66
开头的数据。
如何组装我的pd.read_csv?
data = """
306,1970,
306,1970,
306,1970,
306,1970,
306,1970,
306,1970,
66,1970,1,100,
306,1970,
306,1970,
306,1970,
306,1970,
306,1970,
306,1970,
66,1970,1,100,
"""
import pandas as pd
from io import StringIO
pd.read_csv(StringIO(data), header=None, dtype={1 : str }, comment='6',)
但是这会在30
之后跳过所有内容:
0
0 30
1 30
2 30
3 30
4 30
5 30
6 30
7 30
8 30
9 30
10 30
11 30
答案 0 :(得分:1)
首先加载数据框(并声明列)
df = pd.read_csv(StringIO(data), names=[0,1,2,3] ,header=None,dtype={0 : str })
第二步使用正则表达式从66开始删除字符串
df = df[~df[0].str.contains('^66')]
df