我在使用自动Google搜索的数据框中有各种品牌URL,我已将这些URL分解为单词,并尝试将品牌名称和制造商名称与URL进行比较以检查是否正确(AS大多数公司都有基于其品牌名称或制造公司名称的URL)
try:
from googlesearch import search
except ImportError:
print("No module named 'google' found")
for i in search(Brand.get_attribute("innerHTML"), tld="com", num=15, stop=1, pause=2):
webaddresses.append(i)
for i in search(Manufacturer.get_attribute("innerHTML"), tld="com", num=15, stop=1, pause=2):
webaddresses.append(i)
for i in search(Brand.get_attribute("innerHTML") and Manufacturer.get_attribute("innerHTML"), tld="com", num=15, stop=1, pause=2):
webaddresses.append(i)
for i in search(Brand.get_attribute("innerHTML") and Manufacturer.get_attribute("innerHTML") and "Beverage", tld="com", num=15, stop=1, pause=2):
webaddresses.append(i)
webaddresses = pd.DataFrame(webaddresses)
webaddresses.rename(columns = {list(webaddresses)[0]:'URL'}, inplace=True)
splitting_gurl = webaddresses['URL'].str.split(r'[.\:/?=\-&]+', expand = True)
for i in range(len(splitting_gurl.index)):
row = splitting_gurl.loc[[i]]
for j in range (0,5):
if row[[j]] == str(Brand_check) or row[[j]] == str(Manufacturer_check):
a=webaddresses.loc[[i]]
print(a)
以下是错误:-
File "<ipython-input-124-0b002229b2b7>", line 4, in <module>
if row[[j]] == str(Brand_check) or row[[j]] == str(Manufacturer_check):
File "C:\Users\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1576, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我只希望我的For Loop和IF语句运行并比较这些单词。
答案 0 :(得分:0)
我们可以在python中使用Fuzzywuzzy软件包,它会根据levenstein距离对单词进行比较,并针对插入,删除或替换字母的任何一种行为对其进行惩罚。