如何比较字符串的两列并将一列中的字符串大小写替换为另一列?

时间:2019-10-15 23:55:19

标签: python regex pandas pattern-matching

我有两列“句子和更新”。我想用一个相应的Sentences单词大小写匹配Url末尾的Updates列中的每个单词,并用Sentences中的单词大小写替换它。

我不知道如何进行这种比较。 实际数据中有43k行具有不同的网址。

示例代码:

import pandas as pd

dict1 = {'Updates': ['The new abc.com/Line','Its a abc.com/bright and abc.com/Sunny Day','abc.com/smartphone have taken our the abc.com/WORLD','abc.com/GLOBAL Warming is abc.com/Reaching its abc.com/peak'],
     'Sentences': ['The new line','Its a bright and sunny day','Smartphone have taken our the World','GLOBAL Warming is reaching its Peak ']
        }

df = pd.DataFrame(dict1)

当前O / P:

Sentences           Updates
The new line            The new abc.com/Line

Its a bright and sunny day          Its a abc.com/bright and abc.com/Sunny Day

Smartphone have taken our the World         abc.com/smartphone have taken our the abc.com/WORLD

GLOBAL Warming is reaching its Peak             abc.com/GLOBAL Warming is abc.com/Reaching its abc.com/peak
Expected O/P:

Sentences           Updates
The new line            The new abc.com/line

Its a bright and sunny day          Its a abc.com/bright and abc.com/sunny day

Smartphone have taken our the World         abc.com/Smartphone have taken our the abc.com/World

GLOBAL Warming is reaching its Peak             abc.com/GLOBAL Warming is abc.com/reaching its abc.com/Peak

1 个答案:

答案 0 :(得分:0)

使用re

代码:

import re

dict1 = {
    'Sentences': [
        'The new line',
        'Its a bright and sunny day',
        'Smartphone have taken our the World',
        'GLOBAL Warming is reaching its Peak '
    ],
    'Updates': [
        'The new abc.com/Line',
        'Its a abc.com/bright and abc.com/Sunny Day',
        'abc.com/smartphone have taken our the abc.com/WORLD',
        'abc.com/GLOBAL Warming is abc.com/Reaching its abc.com/peak'
    ]
 }
for sentence, update in zip(dict1['Sentences'], dict1['Updates']):
    urls = [x.split("/")[-1] for x in update.split() if "/" in x]
    for url in urls:
        update = (re.sub(url, re.search(url, sentence, re.IGNORECASE).group(), update, flags=re.IGNORECASE))

    print(f"{sentence}\t{update}")

输出:

The new line    The new abc.com/line
Its a bright and sunny day  Its a abc.com/bright and abc.com/sunny Day
Smartphone have taken our the World abc.com/Smartphone have taken our the abc.com/World
GLOBAL Warming is reaching its Peak     abc.com/GLOBAL Warming is abc.com/reaching its abc.com/Peak