我有一个pandas数据帧:
df
id Description
1 POS Transfer A&W MONTREAL QC
2 MKLI QC Montreal DOLLARAMA
3 PC - PAYMENT FROM - *****11*22
我想从描述中获取地址。我做了什么:
provinces=["qc","on"]
b=[]
for index, row in df.iterrows():
c=0
for i in provinces:
if i in row["Description"].lower().split():
a=[row["id"],row["description"]]
c=1
break
if(c==1):
b.append(a)
b
[[0, 'POS Transfer A&W MONTREAL QC'],
[1, ' MKLI QC Montreal DOLLARAMA ']]
这里我正在捕获数组中的信息。我可以直接在另外一个pandas列中捕获它吗
在这里,我捕获了有地址的所有描述。为了将此发送到google api,我只想选择该省左侧和右侧的1个单词。那就是:
POS Transfer A&W MONTREAL QC
我想抓拍:
A&W MONTREAL QC
在
的情况下MKLI QC Montreal DOLLARAMA
我想抓拍
QC Montreal DOLLARAMA
我该怎么做?
答案 0 :(得分:1)
我认为需要在province
之前和之后选择1个单词:
provinces=["qc","on"]
def f(x):
#convert to lowercast splitted list for find
x1 = x.lower().split()
x2 = x.split()
#get indices of search word with one word before and after
#min and max is for avoid indexing errors
out = [' '.join(x2[max(0, i-1): min(len(x), i+2)]) for i, y in enumerate(x1)
if y in provinces]
#return first matched value if exist
return out[0] if len(out) > 0 else ''
df['new'] = df['Description'].apply(f)
print (df)
id Description new
0 1 POS Transfer A&W MONTREAL QC MONTREAL QC
1 2 MKLI QC Montreal DOLLARAMA MKLI QC Montreal
2 3 PC - PAYMENT FROM - *****11*22