很抱歉标题可能会造成混淆,这是我想做的事情:
我正在尝试将包裹数据框与我的市政代码查询表合并。包裹数据框:
df1.head()
PARID OWNER1
0 B10 2 1 0131 WILSON ROBERT JR
1 B10 2 18B 0131 COMUNALE MICHAEL J & MARY ANN
2 B10 2 18D 0131 COMUNALE MICHAEL J & MARY ANN
3 B10 2 19F 0131 MONROE & JEFFERSON HOLDINGS LLC
4 B10 4 11 0131 NOEL JAMES H
市政法规数据框:
df_LU.head()
PARID Municipality
0 01 Allen Twp.
1 02 Bangor
2 03 Bath
3 04 Bethlehem
4 05 Bethlehem Twp.
df1第一栏中的最后两个数字(“ B10 2 1 0131”中的“ 31”)是我需要与“市政代码”数据框合并的“市政代码”。但是在我的大约30,000条记录中,大约有200条记录以字母结尾,如下所示:
PARID OWNER1
299 D11 10 10 0131F HOWARD THEODORE P & CLAUDIA S
1007 F10 4 3 0134F KNEEBONE JUDY ANN
1011 F10 5 2 0134F KNEEBONE JUDY ANN
1114 F8 18 10 0626F KNITTER WILBERT D JR & AMY J
1115 F8 18 8 0626F KNITTER DONALD
对于这些行,最后一个字母之前的两个数字是我需要提取的代码(例如'D11 10 10 0131F'中的'31')
如果我只是使用 pd.DataFrame(df1 ['PARID']。str [-2:]) 这会给我:
PARID
...
299 1F
...
我需要的是:
PARID
...
299 31
...
我完成此操作的代码很长,几乎吸引了很多人
代码在那里:
#Do the extraction and merge for the rows that end with numbers
df_2015= df1[['PARID','OWNER1']]
df_2015['PARID'] = df_2015['PARID'].str[-2:]
df_15r =pd.merge(df_2015, df_LU, how = 'left', on = 'PARID')
df_15r
#The pivot result for rows generated from above.
Result15_First = df_15r.groupby('Municipality').count()
Result15_First.to_clipboard()
#Check the ID field for rows that end with letters
check15 = df_2015['PARID'].unique()
check15
C = pd.DataFrame({'ID':check15})
NC = C.dropna()
LNC = NC[NC['ID'].str.endswith('F')]
MNC = NC[NC['ID'].str.endswith('A')]
F = [LNC, MNC]
NNC = pd.concat(F, axis = 0)
s = NNC['ID'].tolist()
s
# Identify the records in s
df_p15 = df_2015.loc[df_2015['PARID'].isin(s)]
df_p15
# Separate out a dataframe with just the rows that end with a letter
df15= df1[['PARID','OWNER1']]
df15c = df15[df15.index.isin(df_p15.index)]
df15c
#This step is to create the look up field from the new data frame, the two numbers before the ending letter.
df15c['PARID1'] = df15c['PARID'].str[-3:-1]
df15c
#Then I will join the look up table
df_15t =df15c.merge(df_LU.set_index('PARID'), left_on = 'PARID1', right_index = True)
df_15b = df_15t.groupby('Municipality').count()
df_15b
直到我完成后,我才意识到我的代码对于一个看似简单的任务有多长。如果有更好的方法可以肯定,请告诉我。谢谢。
答案 0 :(得分:2)
您可以使用str.replace
删除所有非数字。之后,您应该可以使用.str[-2:]
。
import pandas as pd
df1 = pd.DataFrame({ 'PARID' : pd.Series(["M3N6V2 B7 13A 0131", "M3N6V2 B7 13B
0131", "Y2 7 B13 0213", "Y2 7 B14 0213", "M5 N4 12 0231A"]),
'Owner' : pd.Series(["Tom", "Jerry", "Jack", "Chris", "Alex"])})
df1['PARID'].str.replace(r'\D+', '').str[-2:]
答案 1 :(得分:2)
您可以使用熊猫字符串方法提取最后两个数字
df1['PARID'].str.extract('.*(\d{2})', expand = False)
你得到
0 31
1 31
2 13
3 13
4 31
答案 2 :(得分:1)
import pandas as pd
df = pd.DataFrame([['M3N6V2 B7 13A 0131','M3N6V2 B7 13B 0131','Y2 7 B13 0213', 'Y2 7 B14 0213', 'M5 N4 12 0231A' ], ['Tom', 'Jerry', 'Jack', 'Chris', 'Alex']])
df = df.T
df.columns = ['PARID', 'Owner']
print(df)
打印您的左侧DataFrame
PARID Owner
0 M3N6V2 B7 13A 0131 Tom
1 M3N6V2 B7 13B 0131 Jerry
2 Y2 7 B13 0213 Jack
3 Y2 7 B14 0213 Chris
4 M5 N4 12 0231A Alex
现在选择正确的DataFrame
import numpy as np
df['IDPART'] = None
for row in df.index:
if df.at[row, 'PARID'][-1].isalpha():
df.at[row, 'IDPART'] = df.at[row, 'PARID'][-3:-1]
else:
df.at[row, 'IDPART'] = df.at[row, 'PARID'][-2:]
df['IDPART']=df['IDPART'].apply(int) #Converting the column to be joined to an integer column
print(df)
给予:
PARID Owner IDPART
0 M3N6V2 B7 13A 0131 Tom 31
1 M3N6V2 B7 13B 0131 Jerry 31
2 Y2 7 B13 0213 Jack 13
3 Y2 7 B14 0213 Chris 13
4 M5 N4 12 0231A Alex 31
然后合并
merged = pd.merge(df, otherdf, how = 'left', left_on = 'IDPART', right_on = 'PARID', left_index=False, right_index=False)
print(merged)
给予:
PARID_x Owner IDPART PARID_y Municipality
0 M3N6V2 B7 13A 0131 Tom 31 31 Tatamy
1 M3N6V2 B7 13B 0131 Jerry 31 31 Tatamy
2 Y2 7 B13 0213 Jack 13 13 Allentown
3 Y2 7 B14 0213 Chris 13 13 Allentown
4 M5 N4 12 0231A Alex 31 31 Tatamy