我本周刚开始学习Python,从事个人项目。我正在处理的脚本的目标是从给定的新闻文章URL中抓取用户的ID和评论并将它们放在一起。
到目前为止看起来像这样:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
articleurl = "http://news.nate.com/view/20170401n02609?mid=n1008"
articleid = articleurl[26:40]
print(articleid)
commentslink = "http://comm.news.nate.com/Comment/ArticleComment/list?artc_sq=" + articleid + "&prebest=0&order=O&mid=n1008&domain=&argList=0"
commentslink2 = "http://comm.news.nate.com/Comment/ArticleComment/list?artc_sq=" + articleid + "&order=O&cmtr_fl=0&prebest=0&clean_idx=&user_nm=&fold=&mid=n1008&domain=&argList=0&return_sq=&twitterAuth=N&connectAuth=N&page=2#comment"
print(commentslink)
print(commentslink2)
chrome_path = r"F:\Downloads\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get(commentslink)
userlist = []
commentlist = []
usernames = (driver.find_elements_by_class_name("""nameui"""))
for userid in usernames:
userlist += userid.text
userlist.append(userid.text)
print(userid.text)
comments = (driver.find_elements_by_class_name("""usertxt"""))
for comment in comments:
commentlist += comment.text
commentlist.append(comment.text)
print(comment.text)
但是,控制台会显示完整的用户ID列表,然后是完整的评论列表。我希望用户的评论紧跟在用户的ID之后。现在,控制台给了我这个:
(1) bong****
(2) yang****
(3) hide****
... (continues through rest of user ID list)
(1) 어려보이고 싶어하는 순간 나이 먹은거래...
(2) 예능안보내는 이유가 있었네
(3) 드럽게 재미없네
... (continues through rest of comment list)
我一直试图让它看起来像这样(数字是为了清晰):
(1) bong****: (1) 어려보이고 싶어하는 순간 나이 먹은거래...
我一直试图解决这个问题,但我所做的一切都没有奏效。我认为问题在于变量或for循环的编码方式。有没有关于如何解决这个问题的想法?
任何帮助将不胜感激,谢谢!
答案 0 :(得分:0)
zip()函数返回一个元组列表,其中第i个元组包含来自每个参数序列或迭代的第i个元素。
finalResult = list(zip(userlist, commentlist))
print(finalResult)
答案 1 :(得分:0)
使用append()
和zip()
:
userlist = []
commentlist = []
userlist.append("user1")
userlist.append("user2")
commentlist.append("Hello")
commentlist.append("Goodbye")
results = list(zip(userlist, commentlist))
print(results)
for user,comment in results:
print("{}: {}".format(user,comment) )
--output:--
[('user1', 'Hello'), ('user2', 'Goodbye')]
user1: Hello
user2: Goodbye
以下是append()
和+=
之间的区别:
data1 = []
data2 = []
data1 += "hello"
data2.append("hello")
print(data1)
print(data2)
--output:--
['h', 'e', 'l', 'l', 'o']
['hello']
您可以将字符串视为字符列表。 +=
指示python在右侧获取列表的每个元素,并将其添加到左侧的列表中。另一方面,append()
在列表中添加了一个东西 - 参数。
如果你最终得到这样的东西:
userlist = [
"user1",
"user2",
"user1",
"user3",
"user2",
"user1"
]
commentlist = [
"Hello",
"Goodbye",
"Yellow",
"Blue",
"Red",
"Clouds"
]
如果你愿意,你可以这样做:
import itertools as iter
userlist = [
"user1",
"user2",
"user1",
"user3",
"user2",
"user1"
]
commentlist = [
"Hello",
"Goodbye",
"Yellow",
"Blue",
"Red",
"Clouds"
]
results = list(zip(userlist, commentlist))
sorted_results = sorted(results)
print(sorted_results)
for user, group in iter.groupby(sorted_results, lambda t: t[0]):
print("{}:".format(user) )
for tuple_ in group:
print("\t{}".format(tuple_[1]) )
--output:--
[('user1', 'Clouds'), ('user1', 'Hello'), ('user1', 'Yellow'), ('user2', 'Goodbye'), ('user2', 'Red'), ('user3', 'Blue')]
user1:
Clouds
Hello
Yellow
user2:
Goodbye
Red
user3:
Blue
顺便说一下,"""nameui"""
相当于"nameui"
和'nameui'
,所以不要打扰使用三重双引号。三引号字符串用于多行字符串。
此外,除非您执行更复杂的操作,例如点击链接,否则您可以使用BeautifulSoup或lxml来抓取网页。