用于值提取,拆分数据和重新格式化的python脚本

时间:2016-01-23 16:16:48

标签: python

这个问题主要与逻辑有关,在某种程度上与语法有关。

我正在创建一个简短的python脚本,只从数百条记录中提取一些“花絮”信息。到目前为止,我非常接近,但代码需要修改,我似乎无法制定。

我有以下表格的数据:

368 1   "Overall evaluation: 1
Invite to interview: 1
Strength or novelty of the idea (1): 2
Strength or novelty of the idea (2): 3
Strength or novelty of the idea (3): 2
Use or provision of open data (1): 2
Use or provision of open data (2): 2
""Open by default"" (1): 3
""Open by default"" (2): 2
Value proposition and potential scale (1): 2
Value proposition and potential scale (2): 2
Market opportunity and timing (1): 2
Market opportunity and timing (2): 1
Triple bottom line impact (1): 2
Triple bottom line impact (2): 2
Triple bottom line impact (3): 2
Knowledge and skills of the team (1): 3
Knowledge and skills of the team (2): 3
Capacity to realise the idea (1): 2
Capacity to realise the idea (2): 1
Capacity to realise the idea (3): 1
Appropriateness of the budget to realise the idea: 1"
368 2   "Overall evaluation: 2
Invite to interview: 3
Strength or novelty of the idea (1): 3
Strength or novelty of the idea (2): 4
Strength or novelty of the idea (3): 4
Use or provision of open data (1): 4
Use or provision of open data (2): 2
""Open by default"" (1): 3
""Open by default"" (2): 3
Value proposition and potential scale (1): 2
Value proposition and potential scale (2): 3
Market opportunity and timing (1): 3
Market opportunity and timing (2): 3
Triple bottom line impact (1): 3
Triple bottom line impact (2): 2
Triple bottom line impact (3): 1
Knowledge and skills of the team (1): 2
Knowledge and skills of the team (2): 2
Capacity to realise the idea (1): 3
Capacity to realise the idea (2): 2
Capacity to realise the idea (3): 2
Appropriateness of the budget to realise the idea: 3"

我需要抓住这些值,但也要将它们与前面的数字联系起来,例如,对于第一个,我需要它是这样的:

368

=2+3+3+3+4+3+2+3+2+3+2+3+2+3+2+3+2+4+3+2+3+2

=2+3+3+3+4+3+2+3+2+3+2+3+2+3+2+3+2+4+3+2+3+2

等等,对于更多实例。

所以我需要将实例标识符(在本例中为368)以及与两条评论的该记录相关联的值进行推文。

我知道如何提取评论的值,例如:

with open('data.txt', 'r') as f:
    for line in f:
        number = int(line.split(':')[1])
        array.append(number)
print '+'.join(array)

但是我无法弄清楚如何用记录标识符来渲染它,因为我试图在上面用示例演示

修改

数据如下所示:

299 1   "Overall evaluation: 3
Invite to interview: 3
Strength or novelty of the idea (1): 4
Strength or novelty of the idea (2): 3
Strength or novelty of the idea (3): 3
Use or provision of open data (1): 4
Use or provision of open data (2): 3
""Open by default"" (1): 2
""Open by default"" (2): 3
Value proposition and potential scale (1): 4
Value proposition and potential scale (2): 2
Market opportunity and timing (1): 4
Market opportunity and timing (2): 4
Triple bottom line impact (1): 4
Triple bottom line impact (2): 2
Triple bottom line impact (3): 2
Knowledge and skills of the team (1): 3
Knowledge and skills of the team (2): 4
Capacity to realise the idea (1): 4
Capacity to realise the idea (2): 3
Capacity to realise the idea (3): 4
Appropriateness of the budget to realise the idea: 3"
299 2   "Overall evaluation: 3
Invite to interview: 3
Strength or novelty of the idea (1): 3
Strength or novelty of the idea (2): 2
Strength or novelty of the idea (3): 4
Use or provision of open data (1): 4
Use or provision of open data (2): 3
""Open by default"" (1): 3
""Open by default"" (2): 2
Value proposition and potential scale (1): 4
Value proposition and potential scale (2): 3
Market opportunity and timing (1): 4
Market opportunity and timing (2): 3
Triple bottom line impact (1): 3
Triple bottom line impact (2): 2
Triple bottom line impact (3): 1
Knowledge and skills of the team (1): 4
Knowledge and skills of the team (2): 4
Capacity to realise the idea (1): 4
Capacity to realise the idea (2): 4
Capacity to realise the idea (3): 4
Appropriateness of the budget to realise the idea: 2"

364 1   "Overall evaluation: 3
Invite to interview: 3
Strength or novelty of the idea (1): 4
Strength or novelty of the idea (2): 1
Strength or novelty of the idea (3): 3
Use or provision of open data (1): 3
Use or provision of open data (2): 3
""Open by default"" (1): 3
""Open by default"" (2): 3
Value proposition and potential scale (1): 4
Value proposition and potential scale (2): 4
Market opportunity and timing (1): 4
Market opportunity and timing (2): 4
Triple bottom line impact (1): 4
Triple bottom line impact (2): 4
Triple bottom line impact (3): 3
Knowledge and skills of the team (1): 3
Knowledge and skills of the team (2): 3
Capacity to realise the idea (1): 4
Capacity to realise the idea (2): 3
Capacity to realise the idea (3): 3
Appropriateness of the budget to realise the idea: 3"
364 2   "Overall evaluation: 3
Invite to interview: 3
Strength or novelty of the idea (1): 4
Strength or novelty of the idea (2): 3
Strength or novelty of the idea (3): 3
Use or provision of open data (1): 4
Use or provision of open data (2): 4
""Open by default"" (1): 4
""Open by default"" (2): 3
Value proposition and potential scale (1): 4
Value proposition and potential scale (2): 3
Market opportunity and timing (1): 2
Market opportunity and timing (2): 3
Triple bottom line impact (1): 4
Triple bottom line impact (2): 4
Triple bottom line impact (3): 1
Knowledge and skills of the team (1): 3
Knowledge and skills of the team (2): 3
Capacity to realise the idea (1): 2
Capacity to realise the idea (2): 4
Capacity to realise the idea (3): 4
Appropriateness of the budget to realise the idea: 2"

1 个答案:

答案 0 :(得分:1)

这就是我要做的。这样做可以完成你的工作,但不是很完美,而是做到了。

此外,Pattern p = Pattern.compile("([^\\s\"']+|\"([^\"]*)\"|'([^']*)')+"); // ^---------------------------------^^- add this 与您的文字相同。

1.txt

输出将是:

#!/usr/bin/python

f=open("1.txt",'r').read().splitlines()
head='0'
body=[]
for x in f:
    if x=="\n" or x.strip()=='':
        continue
    try:
        int(x[0])
        print(head +':'+'+'.join(body))
        tmp=x.split()
        head=tmp[0]+'-'+tmp[1]
        body=[tmp[4]]
    except ValueError as e:
        body.append(x.split(':')[1].strip().strip('\"'))
print(head +':'+'+'.join(body))

现在,您可以通过添加对数组长度的检查来跳过第一个打印,因此您不打印0: 299-1:3+3+4+3+3+4+3+2+3+4+2+4+4+4+2+2+3+4+4+3+4+3 299-2:3+3+3+2+4+4+3+3+2+4+3+4+3+3+2+1+4+4+4+4+4+2 364-1:3+3+4+1+3+3+3+3+3+4+4+4+4+4+4+3+3+3+4+3+3+3 364-2:3+3+4+3+3+4+4+4+3+4+3+2+3+4+4+1+3+3+2+4+4+2 行。