我有一个名为df
season seed team
1609 2010 W01 1246
1610 2010 W02 1452
1611 2010 W03 1307
1612 2010 W04 1458
1613 2010 W05 1396
我需要采用以下格式的新数据框:
team frequency
1246 01
1452 02
1307 03
1458 04
1396 05
频率值来自数据框seed
中名为df
的列中的值
W01 -> 01
W02 -> 02
W03 -> 03
我如何在熊猫中这样做?
答案 0 :(得分:2)
下面的解决方案使用lambda函数应用正则表达式来删除非数字字符。
http://pythex.org/?regex=%5CD&test_string=L16a&ignorecase=0&multiline=0&dotall=0&verbose=0
import pandas as pd
import re
index=[1609,1610,1611,1612,1613,1700]
data = {'season':[2010,2010,2010,2010,2010,2010],
'seed':['W01','W02','W03','W04','W05','L16a'],
'team':[1246,1452,1307,1458,1396,0000]}
df = pd.DataFrame(data,index=index)
df['frequency'] = df['seed'].apply(lambda x: int(re.sub('\D', '', x)))
df2 = df[['team','frequency']].set_index('team')
答案 1 :(得分:1)
# Setup your DataFrame
df = pd.DataFrame({'season': [2010]*5, 'seed': ['W0' + str(i) for i in range(1,6)], 'team': [1246, 1452, 1307, 1458, 1396]}, index=range(1609, 1614))
s = pd.Series(df['seed'].str[1:].values, index=df['team'], name='frequency')
print(s)
产量
team
1246 01
1452 02
1307 03
1458 04
1396 05
Name: frequency, dtype: object