我有两列“句子和更新”。我想用一个相应的Sentences单词大小写匹配Url末尾的Updates列中的每个单词,并用Sentences中的单词大小写替换它。
我不知道如何进行这种比较。 实际数据中有43k行具有不同的网址。
示例代码:
import pandas as pd
dict1 = {'Updates': ['The new abc.com/Line','Its a abc.com/bright and abc.com/Sunny Day','abc.com/smartphone have taken our the abc.com/WORLD','abc.com/GLOBAL Warming is abc.com/Reaching its abc.com/peak'],
'Sentences': ['The new line','Its a bright and sunny day','Smartphone have taken our the World','GLOBAL Warming is reaching its Peak ']
}
df = pd.DataFrame(dict1)
当前O / P:
Sentences Updates
The new line The new abc.com/Line
Its a bright and sunny day Its a abc.com/bright and abc.com/Sunny Day
Smartphone have taken our the World abc.com/smartphone have taken our the abc.com/WORLD
GLOBAL Warming is reaching its Peak abc.com/GLOBAL Warming is abc.com/Reaching its abc.com/peak
Expected O/P:
Sentences Updates
The new line The new abc.com/line
Its a bright and sunny day Its a abc.com/bright and abc.com/sunny day
Smartphone have taken our the World abc.com/Smartphone have taken our the abc.com/World
GLOBAL Warming is reaching its Peak abc.com/GLOBAL Warming is abc.com/reaching its abc.com/Peak
答案 0 :(得分:0)
使用re
代码:
import re
dict1 = {
'Sentences': [
'The new line',
'Its a bright and sunny day',
'Smartphone have taken our the World',
'GLOBAL Warming is reaching its Peak '
],
'Updates': [
'The new abc.com/Line',
'Its a abc.com/bright and abc.com/Sunny Day',
'abc.com/smartphone have taken our the abc.com/WORLD',
'abc.com/GLOBAL Warming is abc.com/Reaching its abc.com/peak'
]
}
for sentence, update in zip(dict1['Sentences'], dict1['Updates']):
urls = [x.split("/")[-1] for x in update.split() if "/" in x]
for url in urls:
update = (re.sub(url, re.search(url, sentence, re.IGNORECASE).group(), update, flags=re.IGNORECASE))
print(f"{sentence}\t{update}")
输出:
The new line The new abc.com/line
Its a bright and sunny day Its a abc.com/bright and abc.com/sunny Day
Smartphone have taken our the World abc.com/Smartphone have taken our the abc.com/World
GLOBAL Warming is reaching its Peak abc.com/GLOBAL Warming is abc.com/reaching its abc.com/Peak