如何从两组数据中打印出一个字符串?

时间:2017-04-02 09:51:09

标签: python selenium

我本周刚开始学习Python,从事个人项目。我正在处理的脚本的目标是从给定的新闻文章URL中抓取用户的ID和评论并将它们放在一起。

到目前为止看起来像这样:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

articleurl = "http://news.nate.com/view/20170401n02609?mid=n1008"

articleid = articleurl[26:40]

print(articleid)

commentslink = "http://comm.news.nate.com/Comment/ArticleComment/list?artc_sq=" + articleid + "&prebest=0&order=O&mid=n1008&domain=&argList=0"
commentslink2 = "http://comm.news.nate.com/Comment/ArticleComment/list?artc_sq=" + articleid + "&order=O&cmtr_fl=0&prebest=0&clean_idx=&user_nm=&fold=&mid=n1008&domain=&argList=0&return_sq=&twitterAuth=N&connectAuth=N&page=2#comment"

print(commentslink)
print(commentslink2)

chrome_path = r"F:\Downloads\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get(commentslink)

userlist = []
commentlist = []

usernames = (driver.find_elements_by_class_name("""nameui"""))
for userid in usernames:
    userlist += userid.text
    userlist.append(userid.text)
    print(userid.text)

comments = (driver.find_elements_by_class_name("""usertxt"""))
for comment in comments:
    commentlist += comment.text
    commentlist.append(comment.text)
    print(comment.text)

但是,控制台会显示完整的用户ID列表,然后是完整的评论列表。我希望用户的评论紧跟在用户的ID之后。现在,控制台给了我这个:

(1) bong****
(2) yang****
(3) hide****
... (continues through rest of user ID list)
(1) 어려보이고 싶어하는 순간 나이 먹은거래...
(2) 예능안보내는 이유가 있었네
(3) 드럽게 재미없네
... (continues through rest of comment list)

我一直试图让它看起来像这样(数字是为了清晰):

(1) bong****: (1) 어려보이고 싶어하는 순간 나이 먹은거래...

我一直试图解决这个问题,但我所做的一切都没有奏效。我认为问题在于变量或for循环的编码方式。有没有关于如何解决这个问题的想法?

任何帮助将不胜感激,谢谢!

2 个答案:

答案 0 :(得分:0)

zip()函数返回一个元组列表,其中第i个元组包含来自每个参数序列或迭代的第i个元素。

finalResult = list(zip(userlist, commentlist))
print(finalResult)

答案 1 :(得分:0)

使用append()zip()

userlist = []
commentlist = []

userlist.append("user1")
userlist.append("user2")
commentlist.append("Hello")
commentlist.append("Goodbye")

results = list(zip(userlist, commentlist))
print(results)

for user,comment in results:
    print("{}: {}".format(user,comment) )

--output:--
[('user1', 'Hello'), ('user2', 'Goodbye')]
user1: Hello
user2: Goodbye

以下是append()+=之间的区别:

data1 = []
data2 = []

data1 += "hello"
data2.append("hello")

print(data1)
print(data2)

--output:--
['h', 'e', 'l', 'l', 'o']
['hello']

您可以将字符串视为字符列表。 +=指示python在右侧获取列表的每个元素,并将其添加到左侧的列表中。另一方面,append()在列表中添加了一个东西 - 参数。

如果你最终得到这样的东西:

userlist = [
    "user1",
    "user2",
    "user1",
    "user3",
    "user2",
    "user1"
]

commentlist = [
    "Hello",
    "Goodbye",
    "Yellow",
    "Blue",
    "Red",
    "Clouds"
]

如果你愿意,你可以这样做:

import itertools as iter

userlist = [
    "user1",
    "user2",
    "user1",
    "user3",
    "user2",
    "user1"
]

commentlist = [
    "Hello",
    "Goodbye",
    "Yellow",
    "Blue",
    "Red",
    "Clouds"
]

results = list(zip(userlist, commentlist))
sorted_results = sorted(results)
print(sorted_results)

for user, group in iter.groupby(sorted_results, lambda t: t[0]):
    print("{}:".format(user) )
    for tuple_ in group:
        print("\t{}".format(tuple_[1]) )


--output:--
[('user1', 'Clouds'), ('user1', 'Hello'), ('user1', 'Yellow'), ('user2', 'Goodbye'), ('user2', 'Red'), ('user3', 'Blue')]

user1:
    Clouds
    Hello
    Yellow
user2:
    Goodbye
    Red
user3:
    Blue

顺便说一下,"""nameui"""相当于"nameui"'nameui',所以不要打扰使用三重双引号。三引号字符串用于多行字符串。

此外,除非您执行更复杂的操作,例如点击链接,否则您可以使用BeautifulSouplxml来抓取网页。