我正在尝试使用 python 在现有字符串中添加单词组合。为了达到这个目的,我写了下面的代码。
import subprocess
from subprocess import Popen, PIPE
cat = subprocess.Popen(["hadoop", "fs", "-cat", "/user/cloudera/rank_t/*"], stdout=subprocess.PIPE)
dumpoff = Popen(["hadoop", "fs", "-put", "-", "/user/cloudera/DATA"],stdin=PIPE)
obrInd = "0"
line1 = ""
for line in cat.stdout:
runnno= line.split('|')[0]
code = line.split('|')[1]
idval = line.split('|')[2]
if (code == "OBR"):
obrInd = runnno
line =line + "|"+"OBR_"+obrInd
dumpoff.stdin.write(line)
print(line)
我的示例数据:
1|ORC||4002C3|4002C3||||||20141231|||1962
2|OBR|1||4002C3|197 HP, RX 16/L|||20141|20141||||||||196248||RJ||3711028|||||F
3|OBX|1|ST|2263||NEGATIVE FOR INTRAEPITHELIAL L.||||||F|||20141231|RJ @#L
4|NTE|1|L|NEGATIVE FOR INTRAEPITHELIAL LESION AND .
5|OBX|2|ST|1158||NIL||||||F|||20141231|RJ@#L
预期输出:
1|ORC||4002C3|4002C3||||||20141231|||1962|
2|OBR|1||4002C3|197 HP, RX 16/L|||20141|20141||||||||196248||RJ||3711028|||||F|OBR_1
3|OBX|1|ST|2263||NEGATIVE FOR INTRAEPITHELIAL L.||||||F|||20141231|RJ @#L|OBR_1
4|NTE|1|L|NEGATIVE FOR INTRAEPITHELIAL LESION AND .|OBR_1
5|OBX|2|ST|1158||NIL||||||F|||20141231|RJ@#L|OBR_1
实际输出:
1|ORC||4002C3|4002C3||||||20141231|||1962|
2|OBR|1||4002C3|197 HP, RX 16/L|||20141|20141||||||||196248||RJ||3711028|||||F
|OBR_1
3|OBX|1|ST|2263||NEGATIVE FOR INTRAEPITHELIAL L.||||||F|||20141231|RJ @#L
|OBR_1
4|NTE|1|L|NEGATIVE FOR INTRAEPITHELIAL LESION AND .
|OBR_1
5|OBX|2|ST|1158||NIL||||||F|||20141231|RJ@#L
|OBR_1
我想要附加的单词是在新行中追加,我希望它在同一行中追加。我做错了什么?
答案 0 :(得分:4)
这是因为每个line
最后都有一个\n
。您可以使用.strip()
删除字符串:
line = line.strip() + "|"+"OBR_"+obrInd
或
line = line.strip('\n') + "|"+"OBR_"+obrInd
如果你关心线的起点/终点处的空白区域。