将字符串列表转换为列表列表

时间:2017-04-10 18:57:29

标签: python

我有一个字符串列表,我正在尝试将其转换为列表列表。我的 字符串列表看起来像这样。

['[[try', 'not', 'become', 'man', 'success', 'but', 'rather', 'try', 
'become', 'man', 'value]', '[look', 'deep', 'into', 'nature', 'and', 'then', 
'you', 'will', 'understand', 'everything', 'better]', '[the', 'true', 
'sign', 'intelligence', 'not', 'knowledge', 'but', 'imagination]', '[we', 
'cannot', 'solve', 'our', 'problems', 'with', 'the', 'same', 'thinking', 
'used', 'when', 'created', 'them]', '[weakness', 'attitude', 'becomes', 
'weakness', 'character]', '["you', 'cant', 'blame', 'gravity', 'for', 
'falling', 'love"]', '[the', 'difference', 'between', 'stupidity', 'and',
'genius', 'that', 'genius', 'has', 'its', 'limits]]']

我的期望输出将如下所示:

 [[['try', 'not', 'become', 'man', 'success', 'but', 'rather', 'try',
 'become', 'man', 'value], [look', 'deep', 'into', 'nature', 'and', 'then',
 'you', 'will', 'understand', 'everything', 'better], [the', 'true', 'sign', 
 'intelligence', 'not', 'knowledge', 'but', 'imagination], [we', 'cannot', 
 'solve', 'our', 'problems', 'with', 'the', 'same', 'thinking', 'used', 
 'when', 'created', 'them], [weakness', 'attitude', 'becomes', 'weakness', 
 'character], ["you', 'cant', 'blame', 'gravity', 'for', 'falling', 'love"],
 [the', 'difference', 'between', 'stupidity', 'and', 'genius', 'that', 
 'genius', 'has', 'its', 'limits']]]

我的输出目前看起来像这样:

 [['[', '[', 't', 'r', 'y'], ['n', 'o', 't'], ['b', 'e', 'c', 'o', 'm', 
 'e'], ['m', 'a', 'n'], ['s', 'u', 'c', 'c', 'e', 's', 's'], ['b', 'u', 
 't'], ['r', 'a', 't', 'h', 'e', 'r'], ['t', 'r', 'y'], ['b', 'e', 'c', 'o', 
 'm', 'e'], ['m', 'a', 'n'], ['v', 'a', 'l', 'u', 'e', ']'], ['[', 'l', 'o', 
 'o', 'k'], ['d', 'e', 'e', 'p'], ['i', 'n', 't', 'o'], ['n', 'a', 't', 'u',
 'r', 'e'], ['a', 'n', 'd'], ['t', 'h', 'e', 'n'], ['y', 'o', 'u'], ['w', 
 'i', 'l', 'l'], ['u', 'n', 'd', 'e', 'r', 's', 't', 'a', 'n', 'd'], ['e', 
 'v', 'e', 'r', 'y', 't', 'h', 'i', 'n', 'g'], ['b', 'e', 't', 't', 'e', 
 'r', ']'], ['[', 't', 'h', 'e'], ['t', 'r', 'u', 'e'], ['s', 'i', 'g', 
 'n'], ['i', 'n', 't', 'e', 'l', 'l', 'i', 'g', 'e', 'n', 'c', 'e'], ['n', 
 'o', 't'], ['k', 'n', 'o', 'w', 'l', 'e', 'd', 'g', 'e'], ['b', 'u', 't'], 
 ['i', 'm', 'a', 'g', 'i', 'n', 'a', 't', 'i', 'o', 'n', ']'], ['[', 'w', 
 'e'], ['c', 'a', 'n', 'n', 'o', 't'], ['s', 'o', 'l', 'v', 'e'], ['o', 'u',
 'r'], ['p', 'r', 'o', 'b', 'l', 'e', 'm', 's'], ['w', 'i', 't', 'h'], ['t', 
 'h', 'e'], ['s', 'a', 'm', 'e'], ['t', 'h', 'i', 'n', 'k', 'i', 'n', 'g'], 
 ['u', 's', 'e', 'd'], ['w', 'h', 'e', 'n'], ['c', 'r', 'e', 'a', 't', 'e', 
 'd'], ['t', 'h', 'e', 'm', ']'], ['[', 'w', 'e', 'a', 'k', 'n', 'e', 's', 
 's'], ['a', 't', 't', 'i', 't', 'u', 'd', 'e'], ['b', 'e', 'c', 'o', 'm', 
 'e', 's'], ['w', 'e', 'a', 'k', 'n', 'e', 's', 's'], ['c', 'h', 'a', 'r', 
 'a', 'c', 't', 'e', 'r', ']'], ['[', '"', 'y', 'o', 'u'], ['c', 'a', 'n', 
 't'], ['b', 'l', 'a', 'm', 'e'], ['g', 'r', 'a', 'v', 'i', 't', 'y'], ['f', 
 'o', 'r'], ['f', 'a', 'l', 'l', 'i', 'n', 'g'], ['l', 'o', 'v', 'e', '"', 
 ']'], ['[', 't', 'h', 'e'], ['d', 'i', 'f', 'f', 'e', 'r', 'e', 'n', 'c', 
 'e'], ['b', 'e', 't', 'w', 'e', 'e', 'n'], ['s', 't', 'u', 'p', 'i', 'd', 
 'i', 't', 'y'], ['a', 'n', 'd'], ['g', 'e', 'n', 'i', 'u', 's'], ['t', 'h',
  'a', 't'], ['g', 'e', 'n', 'i', 'u', 's'], ['h', 'a', 's'], ['i', 't', 
  's'], ['l', 'i', 'm', 'i', 't', 's', ']', ']']]

这是文本文件的内容:

Try not to become a man of success, but rather try to become a man of value. 
Look deep into nature, and then you will understand everything better.
The true sign of intelligence is not knowledge but imagination. 
We cannot solve our problems with the same thinking we used when we created them. 
Weakness of attitude becomes weakness of character.
You can't blame gravity for falling in love. 
The difference between stupidity and genius is that genius has its limits.

这是我到目前为止编写的代码:

Info = [[line.strip()] for line in Info] 
#Turns original list into lists of lists breaking at each new line
Info_Str = str(Info) #Converts list into string to manipulate easier
Info_Str = Info_Str.lower() #Converts all characters to lowercase
Info_Str = Info_Str.replace(".", "")
Info_Str = Info_Str.replace("!", "")
Info_Str = Info_Str.replace("?", "")
Info_Str = Info_Str.replace(":", "")
Info_Str = Info_Str.replace(",", "")
Info_Str = Info_Str.replace(";", "")
Info_Str = Info_Str.replace("'", "")
Info_Str = Info_Str.replace("-", "")
#The above functions remove all punctuation will leaving the '[]' for the lists
Info_Str = Info_Str.split()
Info_List = Info_Str
New_List = [item for item in Info_List if not item.isdigit()] #Removes all numbers
for word in New_List[:]: #Removes words if their length is less than 3 characters 
    if len(word) < 3:
        New_List.remove(word)
print(New_List) #List of Strings
List_Lists = [list(line) for line in New_List]
print(List_Lists)

我知道它不是很优雅,我没有编写很长时间

4 个答案:

答案 0 :(得分:2)

我认为这是你正在尝试做的事情

all_lines = []
keep=set('qazwsxedcrfvtgbyhnujmikolp QAZWSXEDCRFVTGBYHNUJMIKOLP')
for line in Info:
    line = str(line)
    line = ''.join(filter(keep.__contains__, line))
    line = line.split()
    for word in line:
        if len(word)<3:
            line.remove(word)
    all_lines.append(line)
print (all_lines)

结果:

[['Try', 'not', 'become', 'man', 'success', 'but', 'rather', 'try', 'become', 'man', 'value'],
 ['Look', 'deep', 'into', 'nature', 'and', 'then', 'you', 'will', 'understand', 'everything', 'better'],
 ['The', 'true', 'sign', 'intelligence', 'not', 'knowledge', 'but', 'imagination'],
 ['cannot', 'solve', 'our', 'problems', 'with', 'the', 'same', 'thinking', 'used', 'when', 'created', 'them'],
 ['Weakness', 'attitude', 'becomes', 'weakness', 'character'],
 ['You', 'cant', 'blame', 'gravity', 'for', 'falling', 'love'],
 ['The', 'difference', 'between', 'stupidity', 'and', 'genius', 'that', 'genius', 'has', 'its', 'limits']]
感谢@AdamSmith指出以下更改,以使事情更具可读性和简单性:

import string
keep=set(string.ascii_lowercase + string.ascii_uppercase + " ")

答案 1 :(得分:1)

Info_Str = str(Info) #Converts list into string to manipulate easier

我认为将您的列表转换为字符串会使事情变得更难,而不是更容易。

我可能会做类似的事情:

def remove_special_characters(s):
    for c in ".!?:,;'-0123456789":
        s = s.replace(c, "")
    return s

lines = []
with open("data.txt") as file:
    for line in file:
        words = []
        for word in line.split():
            word = word.lower()
            word = remove_special_characters(word)
            if len(word) >= 3:
                words.append(word)
        lines.append(words)
print(lines)

结果(我添加了换行符以增加可读性):

[['Try', 'not', 'become', 'man', 'success', 'but', 'rather', 'try', 'become', 'man', 'value'], 
['Look', 'deep', 'into', 'nature', 'and', 'then', 'you', 'will', 'understand', 'everything', 'better'], 
['The', 'true', 'sign', 'intelligence', 'not', 'knowledge', 'but', 'imagination'], 
['cannot', 'solve', 'our', 'problems', 'with', 'the', 'same', 'thinking', 'used', 'when', 'created', 'them'], 
['Weakness', 'attitude', 'becomes', 'weakness', 'character'], 
['You', 'cant', 'blame', 'gravity', 'for', 'falling', 'love'], 
['The', 'difference', 'between', 'stupidity', 'and', 'genius', 'that', 'genius', 'has', 'its', 'limits']]

答案 2 :(得分:0)

如果您想获得除了任何空格和特殊字符之外的所有单词的列表,您可以将正则表达式\w+(至少一个单词字符)与findall()结合使用:< / p>

import re

text = '''Try not to become a man of success, but rather try to become a man of value. 
Look deep into nature, and then you will understand everything better.
The true sign of intelligence is not knowledge but imagination. 
We cannot solve our problems with the same thinking we used when we created them. 
Weakness of attitude becomes weakness of character.
You can't blame gravity for falling in love. 
The difference between stupidity and genius is that genius has its limits.'''


re.findall(r'\w+', text)
→ ['Try', 'not', 'to', 'become', 'a', 'man', 'of', 'success', 'but', 'rather', 'try', 'to', 'become', 'a', 'man', 'of', 'value', 'Look', 'deep', 'into', 'nature', 'and', 'then', 'you', 'will', 'understand', 'everything', 'better', 'The', 'true', 'sign', 'of', 'intelligence', 'is', 'not', 'knowledge', 'but', 'imagination', 'We', 'cannot', 'solve', 'our', 'problems', 'with', 'the', 'same', 'thinking', 'we', 'used', 'when', 'we', 'created', 'them', 'Weakness', 'of', 'attitude', 'becomes', 'weakness', 'of', 'character', 'You', 'can', 't', 'blame', 'gravity', 'for', 'falling', 'in', 'love', 'The', 'difference', 'between', 'stupidity', 'and', 'genius', 'is', 'that', 'genius', 'has', 'its', 'limits']

答案 3 :(得分:0)

使用正则表达式快速回答:

import re
messy_list = ['[[try', 'not', 'become', 'man', 'success', 'but', 
    'rather', 'try', 
    'become', 'man', 'value]', '[look', 'deep', 'into', 'nature', 
    'and', 'then', 
    'you', 'will', 'understand', 'everything', 'better]', '[the', 
    'true', 
    'sign', 'intelligence', 'not', 'knowledge', 'but', 'imagination]', '[we', 
    'cannot', 'solve', 'our', 'problems', 'with', 'the', 'same', 'thinking', 
    'used', 'when', 'created', 'them]', '[weakness', 'attitude', 'becomes', 
    'weakness', 'character]', '["you', 'cant', 'blame', 'gravity', 'for', 
    'falling', 'love"]', '[the', 'difference', 'between', 'stupidity', 'and',
    'genius', 'that', 'genius', 'has', 'its', 'limits]]'
]
# clean up double quotes in items of list
messy_list = [item.replace("\"", "") for item in messy_list]
# find word pattern in a string
pattern = re.compile(r"(\w+)")
# replace word pattern by adding single quotes before and after each word
clean_string = pattern.sub(r"\g\'<1>\'",  ",".join(messy_list))
# evaluate a string
print eval(clean_string)

结果是:

"[['try','not','become','man','success','but','rather','try','become','man','value'],['look','deep','into','nature','and','then','you','will','understand','everything','better'],['the','true','sign','intelligence','not','knowledge','but','imagination'],['we','cannot','solve','our','problems','with','the','same','thinking','used','when','created','them'],['weakness','attitude','becomes','weakness','character'],['you','cant','blame','gravity','for','falling','love'],['the','difference','between','stupidity','and','genius','that','genius','has','its','limits']]"