我有500k行的数据,整个数据的格式有点不一致 我使用Spyder,pandas来清理数据
我将有一个由数字或字符串组成的列。 如果特定单元格在字符串
中,我想删除整行如下所示,我的代码由于机密信息而进行了一些调整
import pandas as pd
import csv
mydataset = pd.read_csv('test.txt', error_bad_lines=False,
engine='python',
index_col=False,header = None,quoting=csv.QUOTE_NONE,
sep="[\s|,|/]",names=["1","2","3","4","a","b","c",
"h","i","j","k","l","m","n","o","p","f","g",
"q","r","s","t","u","v","w","x","y","z",
"5","6","7","8","9","10","11","12","13","14"])
print (mydataset.shape)
columns =['3','4','h','a','b','c','i','j','k','l','m','n','f','g']
mydataset.drop(columns,inplace=True,axis=1)
print (mydataset.shape)
mydataset = mydataset[(mydataset.q.notnull())&(mydataset.r.notnull())&
(mydataset.s.notnull())&(mydataset.2.notnull())&(mydataset.2 != "@")]
请原谅标题的命名惯例。
example of data:
1 2 3 4 <--header
abc 123 123 bcd <--Data
123 123 123 bcd <--Data
想要检测&#34; abc&#34;并删除整行
请指教!
答案 0 :(得分:-1)
使用dataframe.map,它可能如下(我不确定所有语法是否正确):
def remove(row):
if 'abc' in row:
row = []
mydataset.map(remove)