Question

例如，如果我有这样的家庭住址：

for tag, count in zip(vocab, dist): print count, tag

在名为“address”的列中。我想分别将它分成“街道”，“城市”，“州”栏目。

使用Pandas实现这一目标的最佳方法是什么？

我尝试了71 Pilgrim Avenue, Chevy Chase, MD。

但我得到的错误是df[['street', 'city', 'state']] = df['address'].findall(r"myregex")。

感谢您的帮助：）

Answer 1

您可以使用正则表达式,\s+ ,（#borrowing sample from `Allen` df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True) print (df) address id street city \ 0 71 Pilgrim Avenue, Chevy Chase, MD a 71 Pilgrim Avenue Chevy Chase 1 72 Main St, Chevy Chase, MD b 72 Main St Chevy Chase state 0 MD 1 MD和一个或多个空格）使用split：

address

如果需要删除列df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True) df = df.drop('address', axis=1) print (df) id street city state 0 a 71 Pilgrim Avenue Chevy Chase MD 1 b 72 Main St Chevy Chase MD，请添加drop：

EOFException - if this stream reaches the end before reading all the bytes.

Answer 2

df = pd.DataFrame({'address': {0: '71 Pilgrim Avenue, Chevy Chase, MD',
      1: '72 Main St, Chevy Chase, MD'},
     'id': {0: 'a', 1: 'b'}})
#if your address format is consistent, you can simply use a split function.
df2 = df.join(pd.DataFrame(df.address.str.split(',').tolist(),columns=['street', 'city', 'state']))
df2 = df2.applymap(lambda x: x.strip())

如何使用正则表达式将一列拆分为Pandas中的多列？

2 个答案: