我有一个pandas数据帧如下:
df = pd.DataFrame([ [1,2], [np.NaN,1], ['test string1', 5]], columns=['A','B'] )
df
A B
0 1 2
1 NaN 1
2 test string1 5
我正在使用pandas 0.20。删除“列中任何”列的长度为>的行的最有效方法是什么? 10?
len('test string1') 12
因此对于上述例如,我期望输出如下:
df
A B
0 1 2
1 NaN 1
答案 0 :(得分:8)
如果基于列A
In [865]: df[~(df.A.str.len() > 10)]
Out[865]:
A B
0 1 2
1 NaN 1
如果基于所有列
In [866]: df[~df.applymap(lambda x: len(str(x)) > 10).any(axis=1)]
Out[866]:
A B
0 1 2
1 NaN 1
答案 1 :(得分:3)
In [42]: df
Out[42]:
A B C D
0 1 2 2 2017-01-01
1 NaN 1 NaN 2017-01-02
2 test string1 5 test string1test string1 2017-01-03
In [43]: df.dtypes
Out[43]:
A object
B int64
C object
D datetime64[ns]
dtype: object
In [44]: df.loc[~df.select_dtypes(['object']).apply(lambda x: x.str.len().gt(10)).any(1)]
Out[44]:
A B C D
0 1 2 2 2017-01-01
1 NaN 1 NaN 2017-01-02
<强>解释强>
df.select_dtypes(['object'])
仅选择object
(str
)dtype的列:
In [45]: df.select_dtypes(['object'])
Out[45]:
A C
0 1 2
1 NaN NaN
2 test string1 test string1test string1
In [46]: df.select_dtypes(['object']).apply(lambda x: x.str.len().gt(10))
Out[46]:
A C
0 False False
1 False False
2 True True
现在我们可以&#34;聚合&#34;它如下:
In [47]: df.select_dtypes(['object']).apply(lambda x: x.str.len().gt(10)).any(axis=1)
Out[47]:
0 False
1 False
2 True
dtype: bool
最后我们只能选择值为False
的那些行:
In [48]: df.loc[~df.select_dtypes(['object']).apply(lambda x: x.str.len().gt(10)).any(axis=1)]
Out[48]:
A B C D
0 1 2 2 2017-01-01
1 NaN 1 NaN 2017-01-02
答案 2 :(得分:3)
我不得不为迭戈的工作答案输入一个字符串:
private int GetCount(IDictionary<string, int> counts, string item)
{
int count;
if (!counts.TryGetValue(item, out count))
count = 0;
count++;
counts[item] = count;
return count;
}
private IEnumerable<string> GetItems(IEnumerable<string> items)
{
// Initialize dict for counts with appropriate comparison
var counts = new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);
foreach(var item in items)
yield return string.Format("{0}[{1}]", item, GetCount(counts, item));
}
答案 3 :(得分:1)
使用系列的apply函数,以保留它们:
df = df[df['A'].apply(lambda x: len(x) <= 10)]