在pandas数据框中搜索文本列而不进行循环

时间:2017-12-20 19:42:53

标签: python pandas nlp

我有一个pandas数据框,其中一列是文本描述字符串。我需要创建一个新列,以确定列表中的一个字符串是否在文本描述中。

df = pd.DataFrame({'Description': ['2 Bedroom/1.5 Bathroom end unit Townhouse.  
Available now!', 'Very spacious studio apartment available', ' Two bedroom, 1 
bathroom condominium, superbly located in downtown']})

list_ = ['unit', 'apartment']

然后结果应该是

                                        Description    in list
0  2 Bedroom/1.5 Bathroom end unit Townhouse.  Av...    True
1           Very spacious studio apartment available    True
2   Two bedroom, 1 bathroom condominium, superbly...   False

我可以这样做

for i in df.index.values:
    df.loc[i,'in list'] = any(w in df.loc[i,'Description'] for w in list_)

但是使用大量数据集需要的时间比我想要的长。

2 个答案:

答案 0 :(得分:2)

使用str.contains

list_ = ['unit', 'apartment']
df.Description.str.contains('|'.join(list_))
Out[724]: 
0     True
1     True
2    False
Name: Description, dtype: bool

答案 1 :(得分:1)

使用v = df.Description.values.astype('U')[:, None] df['in list'] = (np.char.find(v, list_) > 0).any(1) df Description in list 0 2 Bedroom/1.5 Bathroom end unit Townhouse. Av... True 1 Very spacious studio apartment available True 2 Two bedroom, 1 bathroom condominium, superbly... False -

        <script>
        $( document ).ready(function() {
        $.ajax({
           url: 'http://www.my-url/wp-json/wp/v2/pages/34/',
           error: function() {
              $('#info').html('<p>An error has occurred</p>');
           },
           dataType: 'json',
           async: false,
           type: 'GET',  
           success: function(data) {
             var theContent = data;
             document.getElementById("remote-content").innerHTML = theContent.content.rendered; 
           }
        });
        });
        </script>

        <span id="remote-content"></span>