这是我的pandas数据框的样子:
id text country datetime
0 1 hello,bye USA 3/20/2016
1 0 good morning UK 3/21/2016
2 x wrong USA 3/21/2016
我想仅将id列设为boolean,如果value不是boolean,则删除该行。
我试过
df=df[df['id'].bool()]
但获得了valueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
答案 0 :(得分:1)
IIUC您可以尝试转换列id
to_numeric
,然后与1
进行比较:
print pd.to_numeric(df.id, errors='coerce') == 1
0 True
1 False
2 False
Name: id, dtype: bool
print df[pd.to_numeric(df.id, errors='coerce') == 1]
id text country datetime
0 1 hello bye USA 3/20/2016
如果您需要删除行,id
列不是0
或1
,请使用isin
:
print df.id.isin(['0','1'])
0 True
1 True
2 False
Name: id, dtype: bool
print df[df.id.isin(['0','1'])]
id text country datetime
0 1 hello bye USA 3/20/2016
1 0 good morning UK 3/21/2016
print pd.to_numeric(df.id, errors='coerce').notnull()
0 True
1 True
2 False
Name: id, dtype: bool
print df[pd.to_numeric(df.id, errors='coerce').notnull()]
id text country datetime
0 1 hello bye USA 3/20/2016
1 0 good morning UK 3/21/2016
最后,您可以将id
列转换为replace
astype
或numpy.in1d加倍{/ 3}}:
bool
编辑:
计时,如果转换为print df.loc[df.id.isin(['0','1']),'id'].replace({'0': False, '1': True})
0 True
1 False
Name: id, dtype: bool
print df.loc[df.id.isin(['0','1']),'id'].astype(int).astype(bool)
0 True
1 False
Name: id, dtype: bool
print df.loc[pd.to_numeric(df.id, errors='coerce').notnull(),'id'].astype(int).astype(bool)
0 True
1 False
Name: id, dtype: bool
的值仅为bool
和0
:
1
最好的是map
{{3}}:
#len(df) = 30k
df = pd.concat([df]*10000).reset_index(drop=True)
In [628]: %timeit df.loc[np.in1d(df['id'], ['0','1']),'id'].map({'0': False, '1': True})
100 loops, best of 3: 2.19 ms per loop
In [629]: %timeit df.loc[np.in1d(df['id'], ['0','1']),'id'].replace({'0': False, '1': True})
The slowest run took 4.46 times longer than the fastest. This could mean that an intermediate result is being cached
100 loops, best of 3: 4.72 ms per loop
In [630]: %timeit df.loc[df['id'].isin(['0','1']),'id'].map({'0': False, '1': True})
100 loops, best of 3: 2.78 ms per loop
In [631]: %timeit df.loc[df['id'].str.contains('0|1'),'id'].map({'0': False, '1': True})
10 loops, best of 3: 20 ms per loop
In [632]: %timeit df.loc[df['id'].isin(['0','1']),'id'].astype(int).astype(bool)
100 loops, best of 3: 9.5 ms per loop
答案 1 :(得分:0)
您可以使用str.isdigit
检查您的id
列是否仅包含数字,然后转换为数字然后转换为布尔值:
In [14]: df['id'].str.isdigit()
Out[14]:
0 True
1 True
2 False
Name: id, dtype:
仅限数字的子集:
In [15]: df.loc[df['id'].str.isdigit(), 'id']
Out[15]:
0 1
1 0
Name: id, dtype: object
转换为bool:
In [17]: df.loc[df['id'].str.isdigit(), 'id'].astype(int).astype(bool)
Out[17]:
0 True
1 False
Name: id, dtype: bool
与pd.to_numeric
的比较:
In [18]: %timeit pd.to_numeric(df.id, errors='coerce').notnull()
10000 loops, best of 3: 178 us per loop
In [19]: %timeit df['id'].str.isdigit()
10000 loops, best of 3: 128 us per loop