我有一个字符串的系列对象,其中有一个我可以随附的特定字符。例如,结束字符为[]
的字符将与结束字符为()
s = pd.Series(['September[jk]', 'firember hfh(start)','secmber(end)','Last day(hjh)',
'October[jk]','firober fhfh (start)','thber(marg)','lasber(sth)',
'December[jk]','anober(start)','secber(start)','Another(hkjl)'])
我可以简单地清理数据,但最后这些字符应该可以帮助我构建像这样的结果数据框
0 September firember hfh
1 September secmber
2 September Last day
3 October firober fhfh
4 October thber
5 October lasber
6 December anober
7 December secber
8 December Another
答案 0 :(得分:0)
我认为这里没有任何魔力,所以我建议您在创建数据帧之前自己解析列表:
import re
import pandas as pd
l = ['September[jk]', 'firember hfh(start)','secmber(end)','Last day(hjh)',
'October[jk]','firober fhfh (start)','thber(marg)','lasber(sth)',
'December[jk]','anober(start)','secber(start)','Another(hkjl)']
month = None
mylist = []
for i, el in enumerate(l):
m = re.match('(.*?)\[.*?\]', el)
if m:
month = m.groups()[0]
else:
m = re.match('(.*?)\(.*?\)', el)
if m:
mylist.append({'Month':month, 'Value':m.groups()[0]})
else:
print("Cannot find a match for {}".format(el))
df = pd.DataFrame(mylist)
print(df)
输出:
Month Value
0 September firember hfh
1 September secmber
2 September Last day
3 October firober fhfh
4 October thber
5 October lasber
6 December anober
7 December secber
8 December Another
旁注:我使用re
库来表示正则表达式,因为它可以适应更复杂的情况,但在您的情况下,您可以使用内置函数,in
和{ {1}}:
split