我有一个包含以下字段的数据集:
abcd-efgh-5678-1234
,...
等)1256 Grant St
,500 wall st
等)$5000
,$10000
等)基于此,我想在Pandas中的DataFrame
对象中添加两个新列。:
wall st
)500
)到目前为止,我已经能够获取单词wall st
的特定实例,如下所示:
str_street = 'Wall St'
wall_st = dataset.loc[dataset['street_address'].str.lower().str.endswith(str_street.lower()), :]
wall_st['street_name'] = ???
wall_st['street_address_number'] = ???
我该怎么做?
答案 0 :(得分:1)
df = pd.DataFrame({'street address': ['500 wall street', '123 blafoo']})
df['street address'].apply(lambda x: pd.Series(x.split(None, 1)))
将导致:
0 1
0 500 wall street
1 123 blafoo
然后,您只需重命名列,并将pd.concat
重命名为原始数据框。
答案 1 :(得分:1)
我认为你需要extract
:
df = pd.DataFrame({'street address': ['500 wall street', '123 blafoo']})
print (df)
street address
0 500 wall street
1 123 blafoo
df1 = df['street address'].str.extract('(?P<number>\d+)(?P<name>.*)', expand=True)
print (df1)
number name
0 500 wall street
1 123 blafoo
split
的解决方案:
df[['number','name']] = df['street address'].str.split(n=1, expand=True)
print (df)
street address number name
0 500 wall street 500 wall street
1 123 blafoo 123 blafoo