csv文件的一部分(' data.csv')我必须处理,如下所示:
parent_id,parent_name,Type,Companyname,Custsupid,Streetaddress
3,Customer,,,C0010,
3,Customer,A,,,
3,Customer,,ACE SYSTEMS,,
3,Customer,,,,Straat 10
7,Customer,,,Q8484,
7,Customer,B,,,
7,Customer,,XYZ AUTOMAT,,
7,Customer,,,,Laan 99
要将此文件导入数据框,我会这样做:
df = pd.read_csv(' data.csv')。fillna('')
这导致:
------------------------------------------------------------------
| |parent_id|parent_name|Type|Companyname|Custsupid|Streetaddress|
------------------------------------------------------------------
|0|3 |Customer | | |C0010 | |
|1|3 |Customer |A | | | |
|2|3 |Customer | |ACE SYSTEMS| | |
|3|3 |Customer | | | |Straat 10 |
|4|7 |Customer | | |Q8484 | |
|5|7 |Customer |B | | | |
|6|7 |Customer | |XYZ AUTOMAT| | |
|7|7 |Customer | | | |Laan 99 |
------------------------------------------------------------------
但是,我想要最终得到的是一个如下所示的数据框:
------------------------------------------------------------------
| |parent_id|parent_name|Type|Companyname|Custsupid|Streetaddress|
------------------------------------------------------------------
|0|3 |Customer |A |ACE SYSTEMS|C0010 |Straat 10 |
|1|7 |Customer |B |XYZ AUTOMAT|Q8484 |Laan 99 |
------------------------------------------------------------------
我已经尝试过使用df.groupby等,但我无法产生预期的结果。
有没有办法用pandas数据框来完成这个?
答案 0 :(得分:2)
In [37]: df.groupby(['parent_id', 'parent_name']).sum()
Out[37]:
Type Companyname Custsupid Streetaddress
parent_id parent_name
3 Customer A ACE SYSTEMS C0010 Straat 10
7 Customer B XYZ AUTOMAT Q8484 Laan 99
sum
正在将字符串添加到一起,因此这依赖于将空字符串添加到非空字符串的事实返回非空字符串。