我的数据框如下:
CollegeDetails": {
"evc edu": "http://www.evc.edu/home",
"sj cc": "http://www.sjcc.edu/",
"de anza": "https://www.deanza.edu/",
}
//Simply using string Interpolation doesn't show anything
<ul>
<li>{{CollegeDetails}}</li>
</ul>
我想通过从名称列中获取字符串来创建单独的列,如下所示:
Instru,Name
16834306,INFOSYS18SEP640.50PE
16834306,INFOSYS18SEP640.50PE
16834306,BHEL18SEP52.80CE
16834306,BHEL18SEP52.80CE
16834306,IOCL18SEP640PE
16834306,IOCL18SEP640PE
注意:对于SP列,十进制显示为十进制,整数显示为int
答案 0 :(得分:5)
在正则表达式模式中将pandas.Series.str.extract
与命名组一起使用
pat = '(?P<Symbol>.*?)(?P<Month>\d{1,2}\w{3})(?P<SP>[\d\.]+)(?P<Type>.*)'
df.join(df.Name.str.extract(pat))
Instru Name Symbol Month SP Type
0 16834306 INFOSYS18SEP640.50PE INFOSYS 18SEP 640.50 PE
1 16834306 INFOSYS18SEP640.50PE INFOSYS 18SEP 640.50 PE
2 16834306 BHEL18SEP52.80CE BHEL 18SEP 52.80 CE
3 16834306 BHEL18SEP52.80CE BHEL 18SEP 52.80 CE
4 16834306 IOCL18SEP640PE IOCL 18SEP 640 PE
5 16834306 IOCL18SEP640PE IOCL 18SEP 640 PE
'(?P<group_name>pattern)'
是创建捕获组并用'group_name'
命名的方法'(?P<Symbol>.*?)'
抓取所有角色,直到下一个捕获组,'?'
说不要对此感到贪婪。'(?P<Month>\d{1,2}\w{3})'
抓住1或2位数字再加上3个字母。 1位或2位数字的含糊不清是我使上一个小组变得不贪心的原因。'(?P<SP>[\d\.]+)'
抓住一个或多个数字或句点。诚然,这不是很优雅,因为它可以抢占'4.2.4.5'
,但应该可以完成工作。'(?P<Type>.*)'
进行清理并抓取其余部分。答案 1 :(得分:4)
您可以使用str.extract
并将.astype
应用于结果,以获取所需的列和特定的数字列为浮点数:
separated = df.Name.str.extract(r"""(?ix)
(?P<Symbol>[a-z]+) # all letters up to a date that matches
(?P<Month>\d{2}\w{3}) # the date (2 numbers then 3 letters)
(?P<SP>.*?) # everything until the "type"
(?P<Type>\w{2}$) # Last two characters of string is the type
""").astype({'SP': 'float'})
会给你的:
Symbol Month SP Type
0 INFOSYS 18SEP 640.5 PE
1 INFOSYS 18SEP 640.5 PE
2 BHEL 18SEP 52.8 CE
3 BHEL 18SEP 52.8 CE
4 IOCL 18SEP 640.0 PE
5 IOCL 18SEP 640.0 PE
然后应用df.join(separated)
以获得最终的DF:
Instru Name Symbol Month SP Type
0 16834306 INFOSYS18SEP640.50PE INFOSYS 18SEP 640.5 PE
1 16834306 INFOSYS18SEP640.50PE INFOSYS 18SEP 640.5 PE
2 16834306 BHEL18SEP52.80CE BHEL 18SEP 52.8 CE
3 16834306 BHEL18SEP52.80CE BHEL 18SEP 52.8 CE
4 16834306 IOCL18SEP640PE IOCL 18SEP 640.0 PE
5 16834306 IOCL18SEP640PE IOCL 18SEP 640.0 PE
答案 2 :(得分:2)
您可以定义拆分功能并创建所需的输出
def f(x):
for i, c in enumerate(x):
if c.isdigit():
break
return [x[0:i], x[i:9], x[9:-2], x[-2:]]
df[['Symbol','Month','SP','Type']] = pd.DataFrame(df.Name.apply(f).tolist())
Instru Name Symbol Month SP Type
0 16834306 INFY18SEP640.50PE INFY 18SEP 640.50 PE
1 16834306 INFY18SEP640.50PE INFY 18SEP 640.50 PE
2 16834306 BHEL18SEP52.80CE BHEL 18SEP 52.80 CE
3 16834306 BHEL18SEP52.80CE BHEL 18SEP 52.80 CE
4 16834306 IOCL18SEP640PE IOCL 18SEP 640 PE
5 16834306 IOCL18SEP640PE IOCL 18SEP 640 PE