如何根据另一个数据框中的条件创建新数据框

时间:2016-11-06 22:49:55

标签: python pandas dataframe

刚进入Python,所以希望我不会在这里问一个愚蠢的问题......

所以我有一个名为“df_complete”的pandas数据框,让我们说100行,并包含名为“type”,“writer”,“status”,“col a”,“col c”的列。我想创建/更新名为“temp_df”的新数据框,并使用“df_complete”值基于条件创建它。

JButton btnStart = new JButton("Start");
JButton btnStop = new JButton("Stop");
boolean flag = false;

public ActionFrame() {
    setLayout(new FlowLayout());
    setSize(600, 600);
    setDefaultCloseOperation(EXIT_ON_CLOSE);
    setVisible(true);
    add(btnStart);
    add(btnStop);

    btnStart.addActionListener(new ActionListener() {

        @Override
        public void actionPerformed(ActionEvent e) {
            // TODO Auto-generated method stub
            flag = true;
            System.out.println("Start waitForFlag()");
            waitForFlag();
        }
    });

    btnStop.addActionListener(new ActionListener() {

        @Override
        public void actionPerformed(ActionEvent e) {
            // TODO Auto-generated method stub
            flag = false;
        }
    });
}

public void waitForFlag() {
    while (flag) {
        try {
            Thread.sleep(500);
            System.out.println("Test");
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}

但是,当我这样做时,我收到以下错误消息:

temp_df = pandas.DataFrame()

if ((df_complete['type'] == 'NDD') & (df_complete['writer'] == 'Mary') & (df_complete['status'] != '7')):
    temp_df['col A'] = df_complete['col a']
    temp_df['col B'] = 'good'
    temp_df['col C'] = df_complete['col c']

我读了这个帖子并将我的“和”改为“&”: Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

我也在这里阅读这个帖子,将所有内容放在括号中:comparing dtyped [float64] array with a scalar of type [bool] in Pandas DataFrame

但错误仍然存​​在。是什么造成的?我该如何解决?

**跟进问题** 另外,如何获得满足条件的那些行的索引值?

2 个答案:

答案 0 :(得分:3)

我认为boolean indexing需要ix才能选择col acol c列:

temp_df = df_complete.ix[(df_complete['type'] == 'NDD') & 
                         (df_complete['writer'] == 'Mary') & 
                         (df_complete['status'] != '7'), ['col a','col c']]
#rename columns
temp_df = temp_df.rename(columns={'col a':'col A','col c':'col C'})
#add new column 
temp_df['col B'] = 'good'
#reorder columns
temp_df = temp_df[['col A','col B','col C']]

样品:

df_complete = pd.DataFrame({'type':  ['NDD','NDD','NT'],
                            'writer':['Mary','Mary','John'],
                            'status':['4','5','6'],
                            'col a': [1,3,5],
                            'col b': [5,3,6],
                            'col c': [7,4,3]}, index=[3,4,5])

print (df_complete)
   col a  col b  col c status type writer
3      1      5      7      4  NDD   Mary
4      3      3      4      5  NDD   Mary
5      5      6      3      6   NT   John

temp_df = df_complete.ix[(df_complete['type'] == 'NDD') & 
                         (df_complete['writer'] == 'Mary') & 
                         (df_complete['status'] != '7'), ['col a','col c']]

print (temp_df)  
   col a  col c
3      1      7
4      3      4

temp_df = temp_df.rename(columns={'col a':'col A','col c':'col C'})
#add new column 
temp_df['col B'] = 'good'
#reorder columns
temp_df = temp_df[['col A','col B','col C']]
print (temp_df)  
   col A col B  col C
3      1  good      7
4      3  good      4

答案 1 :(得分:2)

在当前版本的 Pandas 中,.ix 已弃用;而是使用 .loc

temp_df = df_complete.loc[]