我有一个这样的数据框:
+----------+---------------+---------------+-------------+
| Old_City | New_City_Code | New_City_Name |Old_City_Code|
+----------+---------------+---------------+-------------+
| a | 101 | A | 001 |
+----------+---------------+---------------+-------------+
| b | 101 | A | 002 |
+----------+---------------+---------------+-------------+
| c | 102 | B | 003 |
+----------+---------------+---------------+-------------+
| d | 103 | C | 004 |
+----------+---------------+---------------+-------------+
| e | 103 | C | 005 |
+----------+---------------+---------------+-------------+
| f | 103 | C | 006 |
+----------+---------------+---------------+-------------+
我想用熊猫重塑这个。重新塑造的表应该是:
+---------------+---------------+-----------+-----------+-----------+-----------+-----------+-----------+
| New_City_Code | New_City_Name | Old_City1 | Old_City2 | Old_City3 | Old_Code1 | Old_Code2 | Old_Code3 |
+---------------+---------------+-----------+-----------+-----------+-----------+-----------+-----------+
| 101 | A | a | b | | 001 | 002 | |
+---------------+---------------+-----------+-----------+-----------+-----------+-----------+-----------+
| 102 | B | c | | | 003 | | |
+---------------+---------------+-----------+-----------+-----------+-----------+-----------+-----------+
| 103 | C | d | e | f | 004 | 005 | 006 |
+---------------+---------------+-----------+-----------+-----------+-----------+-----------+-----------+
在pandas中有没有办法进行这种转换(或者如果在Rand中没有,在R中)?我尝试了pivot
,但它没有用(我收到错误ValueError: cannot label index with a null key
)。
答案 0 :(得分:1)
您可以将groupby
与cumcount
一起用于创作列cols
,然后pivot_table
使用aggfunc='first'
,将fillna
用于''
}和reset_index
:
print df
Old_City New_City_Code New_City_Name Old_City_Code
0 a 101 A 001
1 b 101 A 002
2 c 102 B 003
3 d 103 C 004
4 e 103 C 005
5 f 103 C 006
#create columns names for pivoting
df['cols'] = (df.groupby(['New_City_Name', 'New_City_Code']).cumcount() + 1).astype(str)
print df
Old_City New_City_Code New_City_Name Old_City_Code cols
0 a 101 A 001 1
1 b 101 A 002 2
2 c 102 B 003 1
3 d 103 C 004 1
4 e 103 C 005 2
5 f 103 C 006 3
df = pd.pivot_table(df,
index=['New_City_Name', 'New_City_Code'],
columns=['cols'],
values=['Old_City','Old_City_Code'],
aggfunc='first')
#remove multiindex in columns
df.columns = [''.join(col) for col in df.columns.values]
#replace NaN to '', reset index
df = df.fillna('').reset_index()
print df
New_City_Name New_City_Code Old_City1 Old_City2 Old_City3 Old_City_Code1 \
0 A 101 a b 001
1 B 102 c 003
2 C 103 d e f 004
Old_City_Code2 Old_City_Code3
0 002
1
2 005 006