如何根据Python中的字符拆分字符串列表

时间:2018-02-02 16:53:00

标签: python list split sublist

我有一个像这样的字符串列表:

org_list = ['', '<dialog xyz', 'string', 'more string', 'even more string etc', 
        '<dialog xyz', 'string', 'more string', 'even more string etc']

我需要将列表划分为字符串的子列表,将它们精确地划分为'<'字符,以便每个字符串子列表都以'dialog xyz'开头。 样本输出:

[['<dialog xyz', 'string', 'more string', 'even more string etc'], ['<dialog 
  xyz', 'string', 'more string', 'even more string etc']]

我已经尝试了列表理解但它不起作用(返回相同的org_list):

divided_list = [s.split(',') for s in ','.join(org_list).split('<')]

我知道itertools有可能(在一些答案中看到它),但我仍然是初学者,不太了解它们,并希望用我理解的内容解决这个问题,如果可能的话

8 个答案:

答案 0 :(得分:1)

首先,我们可以创建一个list indexes,引用org_list中该位置的字符串以'<'开头的位置。

然后我们可以list-comp在每对slices之间indexes进行迭代。

但是,最后,我们注意到最后一个slice必须到org_list的末尾,所以我们必须将包含一个索引的list连接到最后抓住这个。

希望您能看到该描述如何转换为以下代码。

inds = [i for i, s in enumerate(org_list) if '<' in s] + [len(org_list)]
div_l = [org_list[inds[i]:inds[i+1]] for i in range(len(inds)-1)]

给出了所需的输出:

[['<dialog xyz', 'string', 'more string', 'even more string etc'],
 ['<dialog xyz', 'string', 'more string', 'even more string etc']]

答案 1 :(得分:0)

这应该有效:

split_lists = []
for s in org_list:
    if s.startswith('<') or len(split_lists) == 0:
        split_lists.append([])
    split_lists[-1].append(s)

以下是您输入的结果:

>>> split_lists
[[''], ['<dialog xyz', 'string', 'more string', 'even more string etc'], ['<dialog xyz', 'string', 'more string', 'even more string etc']]

如果要在第一个字符串之前忽略所有字符串,并以'&lt;'开头,就像org_list中第一个元素的空字符串一样,那么使用它:

split_lists = []
for s in org_list:
    if s.startswith('<'):
        split_lists.append([])
    if len(split_lists) == 0:
        continue
    split_lists[-1].append(s)

答案 2 :(得分:0)

org_list = ['', '<dialog xyz', 'ztring', 'more ztring', 'even more string etc', '<dialog xyz', 'string', 'more string', 'even more string etc']

orig = []
start = False

new = []

for item in org_list:
    if item == '<dialog xyz' or item == org_list[-1]:
        if len(new) > 1:
            orig.append(new)
        new = []
        start = True
    if start:
        new.append(item)

print(orig)

这为我提供了你想要的输出。

答案 3 :(得分:0)

如此简单的事情:

org_list = ['', '<dialog xyz', 'string', 'more string', 'even more string etc', '<dialog xyz', 'string', 'more string', 'even more string etc']
split_lists = [] 
for s in org_list:
  if s == '':
    continue
  if s.startswith('<') or len(split_lists) == 0: 
    split_lists.append([s])
    continue
  split_lists[-1].append(s)

print(split_lists)

输出:

[['<dialog xyz', 'string', 'more string', 'even more string etc'], ['<dialog xyz', 'string', 'more string', 'even more string etc']]

答案 4 :(得分:0)

这可能会有所帮助

 [System.Web.Services.WebMethod]
    [ScriptMethod(ResponseFormat = ResponseFormat.Json)]
    public static void fetchDetails(string JobID)
    {
        var conn = System.Configuration.ConfigurationManager.ConnectionStrings["Connection"];
        SqlConnection con = new SqlConnection(conn.ToString());

        String query = "Select TOP 1 * FROM TAble where Jobid =@JobID";
        DataTable dtBasicInfo = new DataTable();
        SqlCommand a = new SqlCommand(query, con);
        a.Parameters.AddWithValue("@JobID", Int32.Parse(JobID));
        con.Open();
        SqlDataAdapter da = new SqlDataAdapter(a);
        da.Fill(dtBasicInfo);
        SqlDataReader value = a.ExecuteReader();
        con.Close();
        JavaScriptSerializer js = new JavaScriptSerializer();

        JavaScriptSerializer jsSerializer = new JavaScriptSerializer();
        List<Dictionary<string, object>> parentRow = new List<Dictionary<string, object>>();
        Dictionary<string, object> childRow;
        foreach (DataRow row in dtBasicInfo.Rows)
        {
            childRow = new Dictionary<string, object>();
            foreach (DataColumn col in dtBasicInfo.Columns)
            {
                childRow.Add(col.ColumnName, row[col]);
            }
            parentRow.Add(childRow);
        }
        var jsk = jsSerializer.Serialize(parentRow);
    }

<强>输出:

[{"JobId":123456789,"UserId":"asdf3a     ","UserName":"Pekki, Barb                      ","Cas":263,"Question":"Q12345","Language":"ENG","Appl":300}]

答案 5 :(得分:0)

您可以使用itertools.groupby

import itertools
import re
org_list = ['', '<dialog xyz', 'string', 'more string', 'even more string etc', 
    '<dialog xyz', 'string', 'more string', 'even more string etc']
new_list = [list(b) for a, b in itertools.groupby(filter(None, org_list), key=lambda x:bool(re.findall('^\<dialog', x)))]
final_list = [new_list[i]+new_list[i+1] for i in range(0, len(new_list), 2)]

输出:

[['<dialog xyz', 'string', 'more string', 'even more string etc'], ['<dialog xyz', 'string', 'more string', 'even more string etc']]

答案 6 :(得分:0)

Сompetition。谁将使这项功能更加困难和缓慢。更简单,它是Python。

<?php $form = ActiveForm::begin([
    'action' => ['index'],
    'method' => 'get',
]); ?>

<?= $form->field($model, 'title') ?>

<?= $form->field($model, 'content') ?>

<div class="form-group">
    <?= Html::submitButton('Search', ['class' => 'btn btn-primary']) ?>
</div>

<?php ActiveForm::end(); ?>

如果你想创建生成器(迭代器),你需要在上面的例子中更改下一个运算符:org_list = ['', '<dialog xyz', 'string', 'more string', 'even more string etc', '<dialog xyz', 'string', '', 'even more string etc' , '<dialog xyz', 'string', 'more string',] def slicelist (pred, iterable): element = [] alw = False for s in iterable: sw = s.startswith if sw(pred): element.append([]) alw=True if alw : element[-1].append(s) return element print slicelist('<', org_list) returnyieldprint slicelist('<', org_list)

答案 7 :(得分:0)

You can do something like this:

org_list = ['', '<dialog xyz', 'string', 'more string', 'even more string etc',
        '<dialog xyz', 'string', 'more string', 'even more string etc']



flag=True
sub_list=[]
final_list=[]
text='<dialog xyz'
for i in org_list:
    if i.startswith(text):


        flag=False

        if sub_list:
            sub_list.insert(0,text)
            final_list.append(sub_list)

            sub_list=[]

    else:
        if flag==False:



            sub_list.append(i)
sub_list.insert(0,text)
final_list.append(sub_list)
print(final_list)

output:

[['<dialog xyz', 'string', 'more string', 'even more string etc'], ['<dialog xyz', 'string', 'more string', 'even more string etc']]