Question

我有一个带

的文本文件1

col0 col1 
g1   text
g2   text,text
g3   text,text,text
g4   text
g5   text,text,text,text,text

需要使用pandas修改它以删除所有具有多个文本输出的行，如下所示

col0 col1 
g1   text
g4   text

唯一区别我的文件总共有~300,000行

Answer 1

如果col1包含扁平字符串：

In [94]: df
Out[94]:
  col0                      col1
0   g1                      text
1   g2                 text,text
2   g3            text,text,text
3   g4                      text
4   g5  text,text,text,text,text

In [95]: df = df.loc[~df.col1.str.contains(',')]

In [96]: df
Out[96]:
  col0  col1
0   g1  text
3   g4  text

In [105]: df
Out[105]:
  col0                            col1
0   g1                          [text]
1   g2                    [text, text]
2   g3              [text, text, text]
3   g4                          [text]
4   g5  [text, text, text, text, text]

In [106]: df.col1.str.len() < 2
Out[106]:
0     True
1    False
2    False
3     True
4    False
Name: col1, dtype: bool

In [107]: df[df.col1.str.len() < 2]
Out[107]:
  col0    col1
0   g1  [text]
3   g4  [text]

Answer 2

这个答案基于@ MaxU的概念，但这增加了一层概括，使您能够更改允许的text值的条件。

df[df.col1.str.count(',') < 1]

  col0  col1
0   g1  text
3   g4  text

删除多个元素熊猫的列

2 个答案: