使用Python查找句子和段落

时间:2017-08-18 14:41:29

标签: python dataframe sentence

我有以下数据格式:

[1956, Jon's story, He sold his soul in 1987, 200]  
[1960, Mary's story, "She liked her soul, but decided to sold it anyway.", 250]  
[1963, "Alice and Peter story, with a twist", "Peter said "Your soul is mine!" and tried to sold it, but Alice had no soul and killed him.", 500]

我想把它分成

[1956, 1960, 1963]  
['Jon's story', 'Mary's story','Alice and Peter story, with a twist']  
['He sold his soul in 1987','She liked her soul, but decided to sold it anyway.','Peter said "Your soul is mine!" and tried to sold it, but Alice had no soul and killed him.']  
[200,250,500]

到目前为止,我已经完成了这个

import re
data = [[1956, "Jon's story", "He sold his soul in 1987", 200],
        [1960, "Mary's story", "She liked her soul, but decided to sold it anyway.", 250],
        [1963, "Alice and Peter story, with a twist", "Peter said 'Your soul is mine!' and tried to sold it, but Alice had no soul and killed him.", 500]]
for row in data:
    line = str(row)
    sentence = re.split(r',', line)

但是这样它会考虑到“”中的逗号分隔。我怎么能避免它?

1 个答案:

答案 0 :(得分:0)

所以这可以通过使用zip而不是re来解决,看下面的代码并看看它是如何工作的。 m4,m3,m2,m1将是包含您需要的值的列表

    data = [[1956, "Jon's story", "He sold his soul in 1987", 200],
    [1960, "Mary's story", "She liked her soul, but decided to sold it anyway.", 250],
    [1963, "Alice and Peter story, with a twist", "Peter said 'Your soul is mine!' and tried to sold it, but Alice had no soul and killed him.", 500]]
    m4, m3 ,m2,m1 = map(list, zip(*data))