我生成了一个字符串列表,如下所示:
在:
for x in links:
full_content = driver.find_elements_by_xpath('apath')
full_content = [x.text for x in full_content]
print(full_content)
Out :(一个非常大的列表序列)
['Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.']
['Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip']
...
['Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.']
我试图将它们附加到:
full_content = pd.DataFrame([x.text for x in full_content])
但是,它不是生成单个数据帧,而是实际生成一个数据帧。如何在没有引号(' '
)的情况下将上述列表序列附加到单个pandas数据帧中?:
col1
0 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
1 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip
...
3 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
答案 0 :(得分:1)
所以我想我明白这就是你要做的。您想为每个full_content
创建一个pandas数据框,然后将其附加到frames
列表中。最后,您可以将所有数据框合并到pd.concat
。
将pandas导入为pd
frames = []
counter_from = 0
for x in links:
driver.get(x)
full_content = driver.find_elements_by_xpath('.//*[@id="segment"]')
full_content = [x.text for x in full_content]
len_items = len(full_content)
counter_to = counter_from + len_items
data = {'text' : pd.Series(full_content,
index=[i for i in range(counter_from, counter_to))])}
df = pd.DataFrame(data)
frames.append(df)
counter_from += len_items
result = pd.concat(frames)