尝试使用for循环比较文本

时间:2019-03-27 22:09:20

标签: python for-loop python-requests

我在使用自动Google搜索的数据框中有各种品牌URL,我已将这些URL分解为单词,并尝试将品牌名称和制造商名称与URL进行比较以检查是否正确(AS大多数公司都有基于其品牌名称或制造公司名称的URL)

try: 
              from googlesearch import search 
except ImportError: 
              print("No module named 'google' found") 


for i in search(Brand.get_attribute("innerHTML"), tld="com", num=15, stop=1, pause=2): 
    webaddresses.append(i)

for i in search(Manufacturer.get_attribute("innerHTML"), tld="com", num=15, stop=1, pause=2): 
    webaddresses.append(i)

for i in search(Brand.get_attribute("innerHTML") and Manufacturer.get_attribute("innerHTML"), tld="com", num=15, stop=1, pause=2): 
    webaddresses.append(i)

for i in search(Brand.get_attribute("innerHTML") and Manufacturer.get_attribute("innerHTML") and "Beverage", tld="com", num=15, stop=1, pause=2): 
    webaddresses.append(i)

webaddresses = pd.DataFrame(webaddresses)
webaddresses.rename(columns = {list(webaddresses)[0]:'URL'}, inplace=True)

splitting_gurl = webaddresses['URL'].str.split(r'[.\:/?=\-&]+', expand = True)




for i in range(len(splitting_gurl.index)):
    row = splitting_gurl.loc[[i]]    
    for j in range (0,5):
        if row[[j]] == str(Brand_check) or row[[j]] == str(Manufacturer_check):
            a=webaddresses.loc[[i]]
            print(a)

以下是错误:-

 File "<ipython-input-124-0b002229b2b7>", line 4, in <module>
if row[[j]] == str(Brand_check) or row[[j]] == str(Manufacturer_check):

File "C:\Users\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1576, in __nonzero__
.format(self.__class__.__name__))

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我只希望我的For Loop和IF语句运行并比较这些单词。

1 个答案:

答案 0 :(得分:0)

我们可以在python中使用Fuzzywuzzy软件包,它会根据levenstein距离对单词进行比较,并针对插入,删除或替换字母的任何一种行为对其进行惩罚。