Python:根据子列表以不同方式对列表中列表中的列表元素进行切片

时间:2012-07-24 10:40:33

标签: python diff python-2.7

我已经达到了这个问题的有限知识的终点。目前,我正在解析差异结果。这是我试图操作的结果的一个例子:

[
[[0, 0, '\xe2\x80\x9cWe are returning again statement. He depicted the attacks as part of a battle launched by Sunnis against the country\xe2\x80\x99s Shia leaders.\r\n\r\nThe first attack came about 5 a.m. on Monday when gunmen stormed onto an Iraqi '], 
[-1, 1, 'military base near the town of Duluiyah in S'], 
[0, 2, 'alahuddin Province and killed 15 Iraqi soldiers, according to security officials. Four soldiers, including a high-ranking was taken prisoner by the insurgents, who escaped with him.\r\n\r\nThe insurgents also attacked the home of a police official in Balad, seriously wounding ']], 

[[0, 4, 'eckpoint near Baquba, killing one policeman. In all, attacks were reported in at least five provinces.\r\n\r\nEight attacks were launched in Kirkuk Province, mostly targeting police patrols, with five people killed and 42 wounded.\r\n\r\nThe offensive started on the third day of the Islamic holy month of Ramadan, and '],
[-1, 5, 'apparently took advantage of the wi'], 
[1, 6, 'll and the other.']]
]

我正在构建差异摘要。以下是它如何分解:

列表是差异结果列表(上例中有两个)。

子列表包含三个元素:

  • 更改前的文字,
  • 构成变化的文字;和
  • 更改后的文字。

子子列表也有三个元素:

  • 一个数字,表示该部分是删除,添加还是不受影响(分别为-1,0,1);
  • 位置编号(顺序);和
  • 字符串本身。

我需要做的是切割子子列表中的字符串,但这取决于他们所在的子列表。

  • 对于子列表中的元素1,我需要切掉除最后4个字符以外的所有字符串。
  • 对于子列表中的元素2,我需要没有切片。
  • 对于子列表中的元素3,我需要切掉除前4个字符之外的所有字符串。

以下是我需要切片的原因示例。在解决方案之前简化了tText:

[[[...]], [[this is a],[sentence],[to demonstrate.]], [[...]]]

解决方案后的文字:

[[[...]], [[is a],[sentence],[to d]], [[...]]]

另一个困难是,我想保留列表的结构。

这是一个艰难的一天 - 我为这个问题的心灵弯曲性质道歉,但这就是Overflow的用途......

思想?

1 个答案:

答案 0 :(得分:2)

您可以使用一个大型解包分配来完成此操作:

[[[b_n, b_p, b_s[-4:]], change, [a_n, a_p, a_s[:4]]]
 for (b_n, b_p, b_s), change, (a_n, a_p, a_s) in results]

替代方案是zip使用并应用slice个对象:

[[[num, position, text[op]]
  for (num, position, text), op in zip(chunk, [slice(-4, None), slice(None), slice(4)])]
 for chunk in results]