如何使用正则表达式将一列拆分为Pandas中的多列?

时间:2017-05-02 05:04:32

标签: python pandas

例如,如果我有这样的家庭住址:

for tag, count in zip(vocab, dist): print count, tag

在名为“address”的列中。我想分别将它分成“街道”,“城市”,“州”栏目。

使用Pandas实现这一目标的最佳方法是什么?

我尝试了71 Pilgrim Avenue, Chevy Chase, MD

但我得到的错误是df[['street', 'city', 'state']] = df['address'].findall(r"myregex")

感谢您的帮助:)

2 个答案:

答案 0 :(得分:16)

您可以使用正则表达式,\s+ ,#borrowing sample from `Allen` df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True) print (df) address id street city \ 0 71 Pilgrim Avenue, Chevy Chase, MD a 71 Pilgrim Avenue Chevy Chase 1 72 Main St, Chevy Chase, MD b 72 Main St Chevy Chase state 0 MD 1 MD 和一个或多个空格)使用split

address

如果需要删除列df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True) df = df.drop('address', axis=1) print (df) id street city state 0 a 71 Pilgrim Avenue Chevy Chase MD 1 b 72 Main St Chevy Chase MD ,请添加drop

EOFException - if this stream reaches the end before reading all the bytes.

答案 1 :(得分:3)

df = pd.DataFrame({'address': {0: '71 Pilgrim Avenue, Chevy Chase, MD',
      1: '72 Main St, Chevy Chase, MD'},
     'id': {0: 'a', 1: 'b'}})
#if your address format is consistent, you can simply use a split function.
df2 = df.join(pd.DataFrame(df.address.str.split(',').tolist(),columns=['street', 'city', 'state']))
df2 = df2.applymap(lambda x: x.strip())