如何搜索值是否在数据框内

时间:2018-06-15 09:18:28

标签: python pandas csv

我有两个名为master_registry.csv和master_reference.csv的CSV文件。通过使用这些CSV,我创建了一些名为' df'和' df2'。通过结合那些' df'和' df2'数据框我创建了一个名为' new_df'的新数据框。问题是我想找到一个值在这个' new_df'数据框。但是,当我试图得到结果时,它给了我一个错误。

这是代码

    # reading csv from the directory
    df = pd.read_csv('application/master_registry.csv')
    df2 = pd.read_csv('application/master_reference.csv')

    # filtering some selected columns form the csv
    df = df.filter(items=['Master_ID', 'Provider First Name', 'Provider Middle Name', 'Provider Last Name (Legal Name)', 'Provider Credential Text', 'Provider Gender Code','Provider License Number State Code_1',
                    'Provider Business Practice Location Address City Name'])

    # creating new data frame with "full name" column
    df['Full_Name'] = df[['Provider First Name', 'Provider Last Name (Legal Name)']].apply(lambda x: ' '.join(x), axis=1)

    new_df = df.set_index('Master_ID').join(df2.set_index('Master_ID'))

    # selecting rows according to the external values
    main = new_df[(new_df['Master_ID']==master_id)]
    print(main.values.tolist())

当我尝试上面的代码时,它给了我这个错误。

  

C:\用户\ ChampsoftWK26 \ ENVS \ jerich_core \ lib中\站点包\大熊猫\核心\ ops.py:1164:   FutureWarning:元素比较失败;返回标量   相反,但将来会进行元素比较         result =方法(y)       [2018-06-15 14:36:07,148] app中的错误:/ search / manual / results / by_npi [POST]的异常       Traceback(最近一次调用最后一次):         文件" C:\ Users \ ChampsoftWK26 \ Envs \ jerich_core \ lib \ site-packages \ flask \ app.py",   第2292行,在wsgi_app中           response = self.full_dispatch_request()         文件" C:\ Users \ ChampsoftWK26 \ Envs \ jerich_core \ lib \ site-packages \ flask \ app.py",   第1815行,在full_dispatch_request中           rv = self.handle_user_exception(e)         文件" C:\ Users \ ChampsoftWK26 \ Envs \ jerich_core \ lib \ site-packages \ flask \ app.py",   第1718行,在handle_user_exception中           重新加注(exc_type,exc_value,tb)         文件" C:\ Users \ ChampsoftWK26 \ Envs \ jerich_core \ lib \ site-packages \ flask_compat.py",   第35行,重新加入           提高价值         文件" C:\ Users \ ChampsoftWK26 \ Envs \ jerich_core \ lib \ site-packages \ flask \ app.py",   第1813行,在full_dispatch_request中           rv = self.dispatch_request()         文件" C:\ Users \ ChampsoftWK26 \ Envs \ jerich_core \ lib \ site-packages \ flask \ app.py",   第1799行,在dispatch_request中           return self.view_functionsrule.endpoint         文件" C:\ Users \ ChampsoftWK26 \ Desktop \ Jericho_v0.0.7 \ application \ routes.py",   第41行,在search_manual_results_by_npi中           info = hub.process_search_by_npi(npi)         文件" C:\ Users \ ChampsoftWK26 \ Desktop \ Jericho_v0.0.7 \ application \ hub.py",   第152行,在process_search_by_npi中           打印(new_df [' Client_Reference_ID'] == NPI)         文件" C:\ Users \ ChampsoftWK26 \ Envs \ jerich_core \ lib \ site-packages \ pandas \ core \ ops.py",   第1253行,包装中           res = na_op(值,其他)         文件" C:\ Users \ ChampsoftWK26 \ Envs \ jerich_core \ lib \ site-packages \ pandas \ core \ ops.py",   第1166行,在na_op中           引发TypeError("无效的类型比较")       TypeError:无效的类型比较       127.0.0.1 - - [15 / Jun / 2018 14:36:07]" POST / search / manual / results / by_npi HTTP / 1.1" 500 -

new_df就像这样

             Provider First Name         ...         Client_Reference_ID
Master_ID                             ...                            
1                     WILLIAM         ...                  1588667638
2                     RICHARD         ...                  1114920261
3                   FRANCISCO         ...                  1861495814
4                        ERIC         ...                  1306849336
5                     RICHARD         ...                  1326041476
6                      GHAITH         ...                  1770586828
7                      TREVOR         ...                  1124021274

2 个答案:

答案 0 :(得分:1)

您需要按索引获取行,请尝试以下操作:

 main = new_df.loc[[master_id]]

例如

new_df.loc[[2]]

返回

             Provider First Name     Client_Reference_ID
Master_ID                                                      
2                 RICHARD                 1114920261

答案 1 :(得分:0)

似乎Master_ID是您的索引。在这种情况下,这应该有效:

'your id' in new_df.index

如果索引存在,则会提供truefalse

或者将其转换为列,然后继续使用您的代码:

new_df['Master_ID'] = df.index
main = new_df[(new_df['Master_ID']==master_id)]