我还阅读了其他一些有关线程,tkinter停止按钮等的文章和问答。大多数想法是关于如何停止一些小循环的。但是我的问题是我想包装一个月前使用Tkinter GUI创建的现有功能。而且现有功能非常庞大,因此我不知道如何停止循环。这是代码。 Reader对象从网络抓取结果中读取输入文本文件并输出一个csvfile。它从输入的购物中心网址中提取所有评论。您可以忽略所有朝鲜语,这并不重要。还有一个名为“ SmartStoreReviewScraper”的对象,但是该功能非常简单。它基本上要求GET一个json文件,然后将其返回给Reader,以便可以将其转换为csvfile。
void MainWindow::on_employee_list_itemDoubleClicked(QListWidgetItem* item)
{
QString test = item->text(); // getting the item text
std::string test_s = test.toStdString(); //converting it to string from Qstring
string name_part = ""; //creating a variable in which name will be stored
int name_pos = test_s.find("name=\"");
for (int i = name_pos + 6; i < test_s.length();i++)
{
if (test_s[i] != '\"')
name_part += test_s[i];
else
break; //extracting name in the item's text, after this the name_part variable value is Sir ron. // similar code for extarcting age and job.
if (test_s.find("<FIRM-1>") != std::string::npos) //if item contains text <FIRM-1> then show this dialogue
{
GrpComd grpComd; //creating instance of dialog
grpComd.exec(); //showing it
grpComd.name_lineedit->settext(name_part); // i tried this to populate name in name_linedit but getting error , access violation reading location
}
}
这是进入我的tkinter根按钮的代码。我必须停止Reader对象,但是我不知道该怎么做。我是线程技术的新手,所以我不了解其背后的整个机制。我每次都在阅读文档。但是有人可以建议我对此采取快速解决方案吗?因为我快没时间了。
class Reader:
def __init__(self, filename, limit=None, delay_time=0):
self.filename = filename
self.limit = limit
self.delay_time = delay_time
self.target_variable = ['평점', '아이디', '시간', '구매옵션', '리뷰내용']
self.read_input_file()
self.extract_file()
def read_input_file(self):
request_df = pd.read_csv(self.filename, names=['names', 'link'], sep='*')
request_df = request_df.set_index('names')
request_df.index = request_df.index + request_df.groupby(level=0).cumcount().astype(str).replace('0','')
request_df.to_csv('output/wd.csv', encoding='utf-8', header=False)
def extract_file(self):
df = pd.read_csv('output/wd.csv', encoding='utf-8', names=['names', 'link'])
for i in range(len(df.index)):
file_name = list(df['names'])[i]
store_link = list(df['link'])[i]
print(f"###################{file_name} 수집 시작###################")
app = SmartStoreReviewScraper()
REVIEWS = app.scraped_reviews
store_data = app.get_store_data(store_link) #스토어 정보
json_review = app.get_review_json(store_data['merchant_no'], store_data['product_no'], 1) #리뷰 정보 리퀘스트
review_data = app.get_review_data(json_review) #해당 아이템 리뷰 (총 아이템수 + 총 페이지수) 정보
total_element = review_data['totalElements'] #총 아이템수
total_pages = review_data['totalPages'] #총 페이지수
print(f'총 아이템 수: {total_element}\n총 페이지 수: {total_pages}')
review_content = app.get_review_content(json_review) #목표 데이터
app.scrape_review_contents(REVIEWS, review_content) #첫 페이지 크롤링
if self.limit >= total_element or self.limit == 0:
self.start_scraper(app, REVIEWS, total_element, total_pages, store_data, file_name)
else:
self.start_scraper(app, REVIEWS, self.limit, total_pages, store_data, file_name)
def start_scraper(self, app, REVIEWS, LIMIT, PAGES, store_data, file_name):
print('목표 데이터 양:'+str(LIMIT))
DF = pd.DataFrame([], columns=self.target_variable)
while len(REVIEWS) < LIMIT:
for page in trange(2, PAGES+1, desc="크롤링 진행도"):
#첫 페이지는 이미 크롤링 완료하였으니 두번째 페이지부터 시작
json = app.get_review_json(store_data['merchant_no'], store_data['product_no'], page)
content = app.get_review_content(json)
app.scrape_review_contents(REVIEWS, content)
time.sleep(self.delay_time)
if len(REVIEWS) >= LIMIT:
break
for i in trange(len(REVIEWS), desc='데이터 변환 중'):
row = pd.DataFrame([REVIEWS[i]], columns=self.target_variable)
DF = DF.append(row, ignore_index=True)
DF.insert(0, column='번호', value=DF.index+1)
print("<데이터 프레임 샘플>")
print(DF.head())
print('데이터 수집 완료! 크롤링된 아이템 수:'+str(len(DF))+'\n')
DF.to_csv(f'output/data/{file_name}.csv', encoding='utf-8-sig', index=False)
我知道我必须在Reader中插入某种stop函数。但是我不知道该怎么做。任何帮助都会非常有帮助,谢谢。
答案 0 :(得分:0)
尝试将事件传递给阅读器对象。
def __init__(self, filename, limit=None, delay_time=0, stop_thread):
self.stop_thread = stop_thread # store event reference
........
def extract_file(self):
df = pd.read_csv('output/wd.csv', encoding='utf-8', names=['names', 'link'])
for i in range(len(df.index)):
if self.stop_thread.is_set(): return # check event
..................
Reader(FILENAME[-1], limit=limit.get(), delay_time=delay_time.get(), self.stop_thread) # pass event to reader
答案 1 :(得分:0)
def scraping(self):
file_reader = Reader(FILENAME[-1], limit=limit.get(), delay_time=delay_time.get())
file_reader.extract_file(self.stop_thread)
messagebox.showinfo('info', 'finished crawling')
由于@ Mike67,我已经编辑了对代码的更改。我认为应该有一个条件语句,该条件语句包装file_reader.extract_file(self.stop_thread)
以检查每个输入项的stop_thread
。任何其他建议将是真正有帮助的。谢谢!