使用关键字分隔符拆分大型文本文件

时间:2015-03-10 04:58:54

标签: python r text split

我尝试使用单词分隔符将大型文本文件拆分为较小的文本文件。我尝试过搜索,但我只看到了在x行之后拆分文件的帖子。我对编程很新,但我已经开始了。我想通过所有的行,如果它以hello开头,它会将所有这些行放到一个文件中,直到它到达下一个hello。你好,文件中的第一个单词。最终,我试图将文本放入R中,但我认为如果我先将它拆分出来会更容易。感谢任何帮助。谢谢。

text_file = open("myfile.txt","r")
lines = text_file.readlines()
print len(lines)
for line in lines :
    print line
    if line[0:5] == "hello":

2 个答案:

答案 0 :(得分:0)

如果你找到一个非常简单的逻辑,试试这个。

text_file = open("myfile.txt","r")
lines = text_file.readlines()
print len(lines)
target = open ("filename.txt", 'a') ## a will append, w will over-write
hello1Found = False
hello2Found = False

for line in lines :
    if hello1Found == True :  
        if line[0:5] == "hello":
            hello2Found = True
            hello1Found = False
            break ## When second hello is found looping/saving to file is stopped 
              ##(though using break is not a good practice here it suffice your simple requirement
        else: 
            print line #write the line to new file
            target.write(line)
    if hello1Found == False:
        if line[0:5] == "hello": ##find first occurrence of hello 
            hello1Found = True
            print line 
            target.write(line)      ##if hello is found for the first time write the 
                                ##line/subsequent lines to new file till the occurrence of second hello

答案 1 :(得分:0)

我是Python的新手。我刚刚在东北大学完成了地理信息系统Python课程。这就是我想出的。

let multiLineString         : Parser<string,unit> =
    optional newline >>. manyCharsTill multiLineStringContents (lookAhead (pstring "\"\"\""))
    |> between (pstring "\"\"\"") (pstring "\"\"\"")