我被困在某个地方。我正在使用硒并使用python进行谷歌搜索提取。
现在我有一些关键字可以输入到Google搜索并提取数据(这是代码的作用)
我还有另一个否定列表,其中也包含某些关键字。现在我要检查那些关键字是否存在于提取的数据中,不要将它们追加到新列表中。我该怎么办?
下面是我的代码:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.chrome.options import Options
import csv
import time
from itertools import groupby,chain
from operator import itemgetter
import sqlite3
final_data = []
def getresults():
global final_data
conn = sqlite3.connect("Jobs_data.db")
conn.execute("""CREATE TABLE IF NOT EXISTS naukri(id INTEGER PRIMARY KEY, KEYWORD text, LINK text,
CONSTRAINT number_unique UNIQUE (KEYWORD,LINK))
""")
cur = conn.cursor()
#chrome_options = Options()
#chrome_options.add_argument("--headless")
#chrome_options.binary_location = '/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary'
driver = webdriver.Chrome("./chromedriver")
with open("./"+"terms12.csv", "r") as csvfile:
reader = csv.reader(csvfile)
next(reader)
for row in reader:
keywords = row[0]
url = "https://www.google.co.in/search?num=10&q=" + keywords
driver.get(url)
time.sleep(5)
count = 0
links = driver.find_elements_by_class_name("g")[:3]
for i in links:
data = i.find_elements_by_class_name("iUh30")
dm = negativelist("junk.csv")
print(dm)
for news in data:
sublist = []
data = news.text
if dm in data:
continue
print("I am in exception")
sublist.append(keywords)
sublist.append(data)
print(sublist)
final_data.append(sublist)
cur.execute("INSERT OR IGNORE INTO naukri VALUES (NULL,?,?)",(keywords,data))
conn.commit()
return final_data
def negativelist(file):
sublist = []
with open("./"+file,"r") as csvfile:
reader = csv.reader(csvfile)
for row in reader:
_data = row[0]
sublist.append(_data)
return sublist
def readfile(alldata, filename):
with open ("./"+ filename, "w",encoding="utf-8") as csvfile:
csvfile = csv.writer(csvfile, delimiter=",")
csvfile.writerow("")
for i in range(0, len(alldata)):
csvfile.writerow(alldata[i])
def main():
getresults()
readfile([[k, *chain.from_iterable(r for _, *r in g)] for k, g in groupby(final_data, key=itemgetter(0))], "Naukri.csv")
main()
收到错误:
Traceback (most recent call last):
File "C:\Users\prince.bhatia\Desktop\projects\google_Rank_Chcker1\Naukri-links.py", line 72, in <module>
main()
File "C:\Users\prince.bhatia\Desktop\projects\google_Rank_Chcker1\Naukri-links.py", line 70, in main
getresults()
File "C:\Users\prince.bhatia\Desktop\projects\google_Rank_Chcker1\Naukri-links.py", line 42, in getresults
if dm in data:
TypeError: 'in <string>' requires string as left operand, not list
答案 0 :(得分:1)
首先,您要检查NegativeKeywords中是否存在数据,这与说NegativeKeywords是否存在于数据中完全不同。
if data in dm:
continue
可能您想要的是:
# Create a function to check if the data contains any of the negative keywords
def dataContainsNegativeKeyword(data, dm):
for word in dm:
if word in data:
return true
return false
# In the code check for that function with your kewywords and data
if dataContainsNegativeKeyword(data, dm):
continue
然后您很奇怪地将关键字和数据都添加到子列表:
sublist.append(keywords)
sublist.append(data)
也许在这里您想要获得的是将 sublist 定义为字典,然后添加 keywords (这可能是一个误名,也许 keyword < / em>应该更好,因为据我所知,它只是字典的键之一,而 data 则是值。
sublist = {}
# Rest of the code here
sublist[keywords] = data
您可以从代码中改进的另一件事是,每次迭代都加载否定关键字:
dm = negativelist("junk.csv")
您实际上不需要在每次迭代中都这样做,只需在begginig处声明:)