I'm parsing out data from one dataframe column into multiple dataframe columns. Specifically, I want to parse out all the phone numbers from a column full of emails. After I parse out the phone numbers, I want to remove those phone numbers from the original email column.
I start with a column in a dataframe, called "email", full of emails.
I am able to successfully parse out the first occurrence of a phone number, using regex, with the following line:
df['phone_num_1'] = df['email'].str.extract('(\(?\d\d\d\)?-? ?\.?\d\d\d-?\.?\d\d\d\d?)')
Running this line again, but with a new column name, captures the original phone number again...
I am able to remove all occurrences of phone numbers using the following line:
df['email'] = df['email'].replace('(\(?\d\d\d\)?-? ?\.?\d\d\d-?\.?\d\d\d\d?)', '', regex = True)
Now all the phone numbers are gone and I lost the second phone number.
If there are two occurrences of a phone number in my original email column, how do I capture the second occurrence? Ideally, I would like for that second occurrence of a phone number to be parsed out into its own column.
In the end, I would have 3 columns: email, phone_num_1, phone_num_2
The email column will no longer have any phone numbers.
I appreciate the help in advance!
The email column might contain a cell with the following string:
Installed new heat pump. System is up and running with no leaks. Gave tenant orientation on new heat pump. installed new aqua cal heat pump Email: example@domain.com | Phone: (123) 456-7890 pool heater is not working. Please contact resident at 234.567.8901. Vendor Paid Pool/Spa Heater Equipment Pool/Spa 10088
Note the two unique phone numbers
I need each phone number extracted from that string and placed into columns of their own.
答案 0 :(得分:0)
抱歉,由于缺乏有关您数据框的信息,我不理解您的意图。但是,由于您在捕获第二个电话号码时遇到了问题,因此可以帮助您确定正则表达式。我让它可以识别电子邮件,电话1和电话2。
data = ({"Email":["Installed new heat pump. System is up and running with no leaks. Gave tenant orientation on new heat pump. installed new aqua cal heat pump Email: example@domain.com | Phone: (123) 456-7890 pool heater is not working. Please contact resident at 234.567.8901. Vendor Paid Pool/Spa Heater Equipment Pool/Spa 10088"]})
df = pd.DataFrame(data)
for item in df['Email']:
reg = re.search(r"(?P<email>\S+\@\S+)\D+(?P<ph1>\d{3}[\D]+\d{3}[\D]+\d{4})?.*(?P<ph2>\d{3}[\D]+\d{3}[\D]+\d{4})",item)
print(list(reg.groups()))