如何从数据框的列中消除3个字母词或4个字母词

时间:2019-05-28 14:06:39

标签: python-3.x pandas text nlp series

我有一个如下数据框:

import pandas as pd
import dask.dataframe as dd
a = {'b':['category','categorical','cater pillar','coming and going','bat','No Data','calling','cal'],
     'c':['strd1','strd2','strd3', 'strd4','strd5','strd6','strd7', 'strd8']
    }
df11 = pd.DataFrame(a,index=['x1','x2','x3','x4','x5','x6','x7','x8'])

我想删除每个值的长度为三个的单词。 我希望结果像这样:

   b                         c
category                   strd1    
categorical                strd2     
cater pillar               strd3
coming and going           strd4      
NaN                        strd5      
No Data                    strd6        
calling                    strd7         
NaN                        strd8

5 个答案:

答案 0 :(得分:4)

使用series.str.len()来确定序列中字符串的长度,然后与series.eq()进行比较,然后使用df.loc[]可以将b的值指定为{{ 1}}符合条件的地方:

np.nan

df11.loc[df11.b.str.len().eq(3),'b']=np.nan

答案 1 :(得分:3)

使用 worker_processes 4; events { worker_connections 1024; } http { client_max_body_size 100M; include mime.types; default_type application/octet-stream; sendfile on; keepalive_timeout 65; upstream loadbalancer.xyz.com { server staging1.xyz.com:996; server staging2.xyz.com:997; keepalive 15; } server { listen 999; location / { proxy_pass https://loadbalancer.xyz.com; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header Host $http_host; proxy_set_header X-NginX-Proxy true; } } } 获取每个字符串的长度,然后如果长度等于3,则用str.len将它们有条件地替换为NaN

np.where

答案 2 :(得分:2)

也许检查if(ID_A==null) substituteParmeter(ID_A)

Select t from Table t where t.ID_A like :id_a and t.ID_B like :id_b

答案 3 :(得分:0)

类似的东西:

for i, ele in enumerate(df11['b']):
    if len(ele) == 3:
        df11['b'][i] = np.nan

答案 4 :(得分:0)

您可以使用where条件:

    df11['b'] = df11['b'].where(df11.b.map(len) != 3, np.nan)