根据列条件组合熊猫行(矢量化)

时间:2019-12-22 03:04:06

标签: python pandas text vectorization

我有这个数据框

import pandas as pd
df = pd.DataFrame({"a":[None, None, "hello1","hello2", None,"hello4","hello5","hello6", None, "hello8", None,"hello10",None ] , "b": ["we", "are the world", "we", "love", "the", "world", "so", "much", "and", "dance", "every", "day", "yeah"]})

    a   b
0   None    we
1   None    are the world
2   hello1  we
3   hello2  love
4   None    the
5   hello4  world
6   hello5  so
7   hello6  much
8   None    and
9   hello8  dance
10  None    every
11  hello10 day
12  None    yeah

所需的输出是:


    a       b       new_text
0   Intro   we      we are the world
2   hello1  we      we
3   hello2  love    love the
5   hello4  world   world
6   hello5  so      so
7   hello6  much    much and
9   hello8  dance   dance every
11  hello10 day     day yeah

我有一个函数可以执行此操作,但是它在熊猫中使用时可能不是最佳解决方案。

def connect_rows_on_condition(df, new_col_name, text, condition):
    if df[condition][0] == None:
        df[condition][0] = "Intro" 
    df[new_col_name] = ""
    index = 1
    last_non_none = 0
    while index < len(df):
        if df[condition][index] != None:
            last_non_none = index
            df[new_col_name][last_non_none] = df[text][index]

        elif df[condition][index] == None :
            df[new_col_name][last_non_none] = df[text][last_non_none] + " " + df[text][index]

        index += 1 

    output_df = df[df[condition].isna() == False]
    return output_df

主要逻辑是,如果列“ a”中为“无”,则将b中的文本放到前面的行中。 是否有不基于循环的解决方案?

2 个答案:

答案 0 :(得分:2)

首先,创建一个描述组的系列:

grouping = df.a.notnull().cumsum()

然后,对于a列,我们可以使用第一个元素,对于b列,我们要连接所有元素:

df.groupby(grouping).agg({'a': 'first', 'b': ' '.join})

这给出了:

         a                 b
a                           
0     None  we are the world
1   hello1                we
2   hello2          love the
3   hello4             world
4   hello5                so
5   hello6          much and
6   hello8       dance every
7  hello10          day yeah

在特殊情况下,您可以自己将None替换为"Intro",因为该文本不会出现在输入中。

答案 1 :(得分:0)

您也可以按单词进行分组,以防没有空值可以以此分组。为了不重复John的解决方案,我将其留给那些可能对如何在非null的情况下执行此操作感兴趣的人:

public class Main
    {
        public static void main(String[] args) {
            System.out.println("  ");

            EmployeeDB x = new EmployeeDB();

            Employee e1 = new Employee(111,"Employee-1","loc-1",100);
            //EmployeeDB d1 = new EmployeeDB();
            x.addEmployee(e1);


            Employee e2 = new Employee(222,"Employee-2","loc-2",200);
          //  EmployeeDB d2 = new EmployeeDB();
            x.addEmployee(e2);

           // EmployeeDB x = new EmployeeDB();
            Employee[] arr = x.listAll();

            for(Employee e : arr){
                System.out.println( e.eid );
                System.out.println( e.eName );
                System.out.println( e.eAddr );
                System.out.println("**************");
            }
        }
    }