我有这个数据框
import pandas as pd
df = pd.DataFrame({"a":[None, None, "hello1","hello2", None,"hello4","hello5","hello6", None, "hello8", None,"hello10",None ] , "b": ["we", "are the world", "we", "love", "the", "world", "so", "much", "and", "dance", "every", "day", "yeah"]})
a b
0 None we
1 None are the world
2 hello1 we
3 hello2 love
4 None the
5 hello4 world
6 hello5 so
7 hello6 much
8 None and
9 hello8 dance
10 None every
11 hello10 day
12 None yeah
所需的输出是:
a b new_text
0 Intro we we are the world
2 hello1 we we
3 hello2 love love the
5 hello4 world world
6 hello5 so so
7 hello6 much much and
9 hello8 dance dance every
11 hello10 day day yeah
我有一个函数可以执行此操作,但是它在熊猫中使用时可能不是最佳解决方案。
def connect_rows_on_condition(df, new_col_name, text, condition):
if df[condition][0] == None:
df[condition][0] = "Intro"
df[new_col_name] = ""
index = 1
last_non_none = 0
while index < len(df):
if df[condition][index] != None:
last_non_none = index
df[new_col_name][last_non_none] = df[text][index]
elif df[condition][index] == None :
df[new_col_name][last_non_none] = df[text][last_non_none] + " " + df[text][index]
index += 1
output_df = df[df[condition].isna() == False]
return output_df
主要逻辑是,如果列“ a”中为“无”,则将b中的文本放到前面的行中。 是否有不基于循环的解决方案?
答案 0 :(得分:2)
首先,创建一个描述组的系列:
grouping = df.a.notnull().cumsum()
然后,对于a列,我们可以使用第一个元素,对于b列,我们要连接所有元素:
df.groupby(grouping).agg({'a': 'first', 'b': ' '.join})
这给出了:
a b
a
0 None we are the world
1 hello1 we
2 hello2 love the
3 hello4 world
4 hello5 so
5 hello6 much and
6 hello8 dance every
7 hello10 day yeah
在特殊情况下,您可以自己将None
替换为"Intro"
,因为该文本不会出现在输入中。
答案 1 :(得分:0)
您也可以按单词进行分组,以防没有空值可以以此分组。为了不重复John的解决方案,我将其留给那些可能对如何在非null的情况下执行此操作感兴趣的人:
public class Main
{
public static void main(String[] args) {
System.out.println(" ");
EmployeeDB x = new EmployeeDB();
Employee e1 = new Employee(111,"Employee-1","loc-1",100);
//EmployeeDB d1 = new EmployeeDB();
x.addEmployee(e1);
Employee e2 = new Employee(222,"Employee-2","loc-2",200);
// EmployeeDB d2 = new EmployeeDB();
x.addEmployee(e2);
// EmployeeDB x = new EmployeeDB();
Employee[] arr = x.listAll();
for(Employee e : arr){
System.out.println( e.eid );
System.out.println( e.eName );
System.out.println( e.eAddr );
System.out.println("**************");
}
}
}