我有一堆txt文件,我需要将它们编译成单个主文件。我使用read_csv
提取其中的信息。有一些行要删除,我想知道是否有可能使用skiprows
功能而不指定我要删除的行的索引数,而是根据其行内容/来判断要删除的行/值。数据看起来像这样来说明我的观点。
Index Column 1 Column 2
0 Rows to drop Rows to drop
1 Rows to drop Rows to drop
2 Rows to drop Rows to drop
3 Rows to keep Rows to keep
4 Rows to keep Rows to keep
5 Rows to keep Rows to keep
6 Rows to keep Rows to keep
7 Rows to drop Rows to drop
8 Rows to drop Rows to drop
9 Rows to keep Rows to keep
10 Rows to drop Rows to drop
11 Rows to keep Rows to keep
12 Rows to keep Rows to keep
13 Rows to drop Rows to drop
14 Rows to drop Rows to drop
15 Rows to drop Rows to drop
最有效的方法是什么?
答案 0 :(得分:1)
不。跳过行将不允许您基于行的内容/值进行删除。
skiprows :类似于列表的,可int或可调用的,可选的
要跳过的行号(0索引)或 文件开头要跳过的行数(int)。 如果可调用,则将针对行索引评估可调用函数,如果应跳过该行,则返回True,否则返回False 除此以外。有效的可调用参数的一个示例是 lambda x: x在[0,2] 中。
答案 1 :(得分:1)
由于您不能使用行列来做到这一点,所以我可以认为这种方式很有效:
df = pd.read_csv(filePath)
df = df.loc[df['column1']=="Rows to keep"]
答案 2 :(得分:1)
这是您想要实现的目标吗?
import pandas as pd
df = pd.DataFrame({'A':['row 1','row 2','drop row','row 4','row 5',
'drop row','row 6','row 7','drop row','row 9']})
df1 = df[df['A']!='drop row']
print (df)
print (df1)
原始数据框:
A
0 row 1
1 row 2
2 drop row
3 row 4
4 row 5
5 drop row
6 row 6
7 row 7
8 drop row
9 row 9
已删除行的新DataFrame:
A
0 row 1
1 row 2
3 row 4
4 row 5
6 row 6
7 row 7
9 row 9
虽然不能基于内容跳过行,但是可以基于索引跳过行。以下是您的一些选择:
df = pd.read_csv('xyz.csv', skiprows=2)
#this will skip 2 rows from the top
df = pd.read_csv('xyz.csv', skiprows=[0,2,5])
#this will skip rows 1, 3, and 6 from the top
#remember row 0 is the 1st line
#you can also skip by counts.
#In below example, skip 0th row and every 5th row from there on
def check_row(a):
if a % 5 == 0:
return True
return False
df = pd.read_csv('xyz.txt', skiprows= lambda x:check_row(x))
有关此内容的更多详细信息,请参见有关skip rows的链接