例如,如果我有这样的家庭住址:
for tag, count in zip(vocab, dist):
print count, tag
在名为“address”的列中。我想分别将它分成“街道”,“城市”,“州”栏目。
使用Pandas实现这一目标的最佳方法是什么?
我尝试了71 Pilgrim Avenue, Chevy Chase, MD
。
但我得到的错误是df[['street', 'city', 'state']] = df['address'].findall(r"myregex")
。
感谢您的帮助:)
答案 0 :(得分:16)
您可以使用正则表达式,\s+
,
(#borrowing sample from `Allen`
df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True)
print (df)
address id street city \
0 71 Pilgrim Avenue, Chevy Chase, MD a 71 Pilgrim Avenue Chevy Chase
1 72 Main St, Chevy Chase, MD b 72 Main St Chevy Chase
state
0 MD
1 MD
和一个或多个空格)使用split
:
address
如果需要删除列df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True)
df = df.drop('address', axis=1)
print (df)
id street city state
0 a 71 Pilgrim Avenue Chevy Chase MD
1 b 72 Main St Chevy Chase MD
,请添加drop
:
EOFException - if this stream reaches the end before reading all the bytes.
答案 1 :(得分:3)
df = pd.DataFrame({'address': {0: '71 Pilgrim Avenue, Chevy Chase, MD',
1: '72 Main St, Chevy Chase, MD'},
'id': {0: 'a', 1: 'b'}})
#if your address format is consistent, you can simply use a split function.
df2 = df.join(pd.DataFrame(df.address.str.split(',').tolist(),columns=['street', 'city', 'state']))
df2 = df2.applymap(lambda x: x.strip())