如何在写入列表时过滤Pandas系列

时间:2018-02-20 22:38:00

标签: python pandas

对于pandas数据框ui <- fluidPage(# App title ---- theme = shinytheme('flatly'), titlePanel("Employtics"), # Sidebar layout with input and output definitions ---- sidebarLayout( # Sidebar panel for inputs ---- sidebarPanel( # Input: Select a file ---- fileInput("FileInput", "Choose file"), # Input: Horizontal Line ---- tags$hr(), uiOutput('textField'), uiOutput('docIdField') ), # Main panel for displaying outputs ---- mainPanel(tabsetPanel( tabPanel( 'Word Clouds', fluidRow(plotOutput( 'wordcloud', width = "100%", height = '800')) ) ) )) output$wordcloud = renderPlot({ d1 = dCorp() withProgress(message = 'Building Wordclouds', detail = 'This may take a while...',expr = 0) if (is.null(input$selectGroup2)) { textplot_wordcloud( d1, max.words = 15 ) } else{ textplot_wordcloud(d1, comparison = T, max.words = 15, title.size = 1) } }) shinyApp(ui,server) 中的给定列'A',我一直在尝试编写列'tbl'并过滤掉.tolist()项。这有效,但似乎不太可读:

'.'

此外,检查一个列表似乎不必要地缓慢,但list_of_A = tbl['A'][~tbl['A'].isin(['.'])].tolist()似乎由于模式匹配会更慢。我错过了一个更好的方法吗?

更新 @jpp,@ piRSquared和@ Scott-Boston都有很好的方法,所以它归结为一个测试:

str.contains('.')

发现>>> tbl = pd.DataFrame(np.random.randn(50000, 3), columns=list('ABC')) >>> tbl.loc[tbl.sample(10000).index, 'A'] = '.' >>> min(timeit.repeat("list_of_A = tbl.loc[tbl['A'].ne('.'), 'A'].tolist()", repeat=1000, number=100, globals=globals())) 0.37328900000102294 >>> min(timeit.repeat("list_of_A = tbl.A.values[tbl.A.values != '.'].tolist()", repeat=1000, number=100, globals=globals())) 0.1470019999997021 >>> min(timeit.repeat("tbl.query('A != \".\"')['A'].tolist()", repeat=1000, number=100, globals=globals())) 0.45748099999946135 为我开辟了一个充满可能性的世界,但是要粉碎'n'获取过滤列列表,看起来转换为ndarray是最快的。

4 个答案:

答案 0 :(得分:2)

有两点需要注意:

  • 链式索引为explicitly discouraged;避免,因为它是永远不必要的。
  • 如果您要与list / set / pd.Series进行比较,则仅建议使用
  • loc。使用==运营商的list_of_A = tbl.loc[~(tbl['A'] == '.'), 'A'].tolist() 访问者就足够了。

请改为尝试:

list_of_A = tbl.loc[tbl['A'].ne('.'), 'A'].tolist()

正如@BradSolomon所指出的,还有另一种选择:

{{1}}

答案 1 :(得分:2)

使用列表理解

list_of_A = [x for x in df['A'].values.tolist() if x != '.']

可能性能更高

tbl.A.values[tbl.A.values != '.'].tolist()

更快

public class Mm_Registration_ConfigSettings extends FragmentActivity
{

private ViewPager viewPager;
private FragmentPagerAdapter adapter;

@Override
protected void onCreate( Bundle savedInstanceState )
{
    super.onCreate( savedInstanceState );

    setContentView(R.layout.activity_registration_configsettings);

    adapter    =   new FragmentPagerAdapter( getSupportFragmentManager() )
                                    {
                                        @Override
                                        public Fragment getItem( final int index)
                                        {
                                            switch (index)
                                            {
                                                case 0:
                                                    return new Mm_Registration_ConfigSettings_fragment1();
                                                case 1:
                                                    return new Mm_Registration_ConfigSettings_fragment2();
                                                case 2:
                                                    return new Mm_Registration_ConfigSettings_fragment3();
                                                case 3:
                                                    return new Mm_Registration_ConfigSettings_fragment4();
                                                case 4:
                                                    return new Mm_Registration_ConfigSettings_fragment5();
                                            }

                                            return null;
                                        }

                                        @Override
                                        public int getCount() {
                                            // get item count - equal to number of tabs
                                            return 5;
                                        }

                                    };

    viewPager = (ViewPager) findViewById(R.id.pager);
    viewPager.setAdapter(adapter);

    TabLayout tabLayout = (TabLayout ) findViewById(R.id.tabDots);
    tabLayout.setupWithViewPager( (ViewPager) findViewById(R.id.pager), true);

}

public void nextFragment( View v )
{
    if ( viewPager.getCurrentItem() < 4 )
        viewPager.setCurrentItem( viewPager.getCurrentItem() + 1, true );
    else
    {
        if ( !( (CheckBox ) findViewById( R.id.licenseAccept ) ).isChecked() )
            Toast.makeText( Mm_Registration_ConfigSettings.this, "Accept the license to continue." , Toast.LENGTH_SHORT ).show();
        else
        {
            HashMap hashMap     =   new HashMap();

            hashMap.put( "switch_sex" , ( ( ( (Switch) findViewById( R.id.switch_sex ) ).isChecked() ) ? 1 : 0 ) );
            hashMap.put( "switch_smoker" , ( ( ( (Switch) findViewById( R.id.switch_smoker ) ).isChecked() ) ? 1 : 0 ) );
            ;
        }
    }
}

答案 2 :(得分:1)

您可以使用query提高可读性:

tbl.query('A != "."')['A'].tolist()

答案 3 :(得分:0)

使用临时变量和两行代码使其更具可读性:

exclude   = tbl['A'].isin(['.'])
list_of_A = tbl['A'][~exclude].tolist()