如何在pandas数据框中附加一系列列表?

时间:2016-11-03 19:09:54

标签: python python-3.x pandas

我生成了一个字符串列表,如下所示:

在:

for x in links:    
    full_content = driver.find_elements_by_xpath('apath')    
    full_content = [x.text for x in full_content]
    print(full_content)

Out :(一个非常大的列表序列)

['Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.']
['Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip']
...
['Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.']

我试图将它们附加到:

full_content = pd.DataFrame([x.text for x in full_content])

但是,它不是生成单个数据帧,而是实际生成一个数据帧。如何在没有引号(' ')的情况下将上述列表序列附加到单个pandas数据帧中?:

     col1
0    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
1    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip
...
3   Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

1 个答案:

答案 0 :(得分:1)

所以我想我明白这就是你要做的。您想为每个full_content创建一个pandas数据框,然后将其附加到frames列表中。最后,您可以将所有数据框合并到pd.concat。     将pandas导入为pd

frames = []
counter_from = 0
for x in links:    
    driver.get(x)
    full_content = driver.find_elements_by_xpath('.//*[@id="segment"]')    
    full_content = [x.text for x in full_content]
    len_items = len(full_content)
    counter_to = counter_from + len_items


    data = {'text' : pd.Series(full_content, 
                               index=[i for i in range(counter_from, counter_to))])}
    df = pd.DataFrame(data)
    frames.append(df)
    counter_from += len_items

result = pd.concat(frames)