如何使用字符串清除将系列对象转换为数据框

时间:2016-12-08 21:58:11

标签: python-3.x pandas

我有一个字符串的系列对象,其中有一个我可以随附的特定字符。例如,结束字符为[]的字符将与结束字符为()

的字符对应。
s = pd.Series(['September[jk]', 'firember hfh(start)','secmber(end)','Last day(hjh)',
              'October[jk]','firober fhfh (start)','thber(marg)','lasber(sth)',
              'December[jk]','anober(start)','secber(start)','Another(hkjl)'])

我可以简单地清理数据,但最后这些字符应该可以帮助我构建像这样的结果数据框

0   September   firember hfh
1   September   secmber
2   September  Last day
3    October   firober fhfh
4    October     thber
5    October    lasber
6   December    anober
7   December    secber
8   December   Another

1 个答案:

答案 0 :(得分:0)

我认为这里没有任何魔力,所以我建议您在创建数据帧之前自己解析列表:

import re
import pandas as pd

l = ['September[jk]', 'firember hfh(start)','secmber(end)','Last day(hjh)',
              'October[jk]','firober fhfh (start)','thber(marg)','lasber(sth)',
              'December[jk]','anober(start)','secber(start)','Another(hkjl)']

month = None
mylist = []
for i, el in enumerate(l):
    m = re.match('(.*?)\[.*?\]', el)
    if m:
        month = m.groups()[0]
    else:
        m = re.match('(.*?)\(.*?\)', el)
        if m:
            mylist.append({'Month':month, 'Value':m.groups()[0]})
        else:
            print("Cannot find a match for {}".format(el))

df = pd.DataFrame(mylist)
print(df)

输出:

       Month          Value
0  September   firember hfh
1  September        secmber
2  September       Last day
3    October  firober fhfh 
4    October          thber
5    October         lasber
6   December         anober
7   December         secber
8   December        Another

旁注:我使用re库来表示正则表达式,因为它可以适应更复杂的情况,但在您的情况下,您可以使用内置函数,in和{ {1}}:

split