Pandas:替换字符串中的值

时间:2016-08-10 09:28:54

标签: python string pandas dataframe duplicates

我有数据框,我尝试将其替换为其他df

我用:

<div class=form-group>
  <input type="text" 
         class="form-control" 
         placeholder="Firstname" 
         name="firstname" 
         ng-model="name.firstname" 
         required="true">
  <div role="alert" 
       ng-messages="form.firstname.$error" 
       data-ng-if="form.$submitted && form.firstname.$invalid">
       <span class="error" ng-message="required">Required</span>
  </div>
</div>

但是我收到了一个错误:

ng-include

我应该改变什么? df['term_code'] = df.search_term.map(rep_term.set_index('search_term')['code_action'])

的地方
File "C:/Users/����� �����������/Desktop/projects/find_time_before_buy/graph (2).py", line 36, in <module>
df['term_code'] = df.search_term.map(rep_term.set_index('search_term')['code_action'])
 File "C:\Python27\lib\site-packages\pandas\core\series.py", line 2101, in map
indexer = arg.index.get_indexer(values)
 File "C:\Python27\lib\site-packages\pandas\indexes\base.py", line 2082, in get_indexer
   raise InvalidIndexError('Reindexing only valid with uniquely'
pandas.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

search_term看起来像

729948                               None  
729949                               None  
729950                               None  
729951  пансионат джемете отдых 2016 цены  
729952                               None  
729953                               None  
729954                               купить телефон  
729955                               None  
729956                               вк  
729957                               None  
729958                               яндекс  

1 个答案:

答案 0 :(得分:4)

DataFrame rep_termsearch_term中存在重复项问题。

我模拟它:

import pandas as pd

df = pd.DataFrame({'search_term':[1,2,3]})

print (df)
   search_term
0            1
1            2
2            3

对于1中的值search_term,您在2中有code_action个值:

rep_term = pd.DataFrame({'search_term':[1,2,1], 'code_action':['ss','dd','gg']})
print (rep_term)
  code_action  search_term
0          ss            1
1          dd            2
2          gg            1


df['term_code'] = df.search_term.map(rep_term.set_index('search_term')['code_action'])
print (df)
#InvalidIndexError: Reindexing only valid with uniquely valued Index objects

首先,通过duplicated确定重复值的行:

print (rep_term[rep_term.duplicated(subset=['search_term'], keep=False)])
  code_action  search_term
0          ss            1
2          gg            1

然后,您可以通过drop_duplicates

保留上一个或第一个值来保持两面性
rep_term1 = rep_term.drop_duplicates(subset=['search_term'], keep='first')
print (rep_term1)
  code_action  search_term
0          ss            1
1          dd            2

rep_term2 = rep_term.drop_duplicates(subset=['search_term'], keep='last')
print (rep_term2)
  code_action  search_term
1          dd            2
2          gg            1