Question

我可以使用一些关于如何创建for循环/ if语句的建议，这样我就可以删除文件中的一些不必要的文本。

我有一个txt文件，大153mb。我知道如何在python中打开它，但我仍然不是最好的东西（我不需要的文字）。

我发布了一个你可以在这里看到的txt文件的例子：

@xirwinshemmo thanks for the follow :)
hii... if u want to make a new friend just add me on facebook! :) xx      https:\/\/t.co\/RCYFVrmdDG
Just wanna say if you ever feel lonely or sad or bored, just come and talk to me. I'm   free anytime :)
I hope she not a spy for someone. I hope she real on neautral side. Because just her who   i trust. :-)
@dessdim @Bureemi not always but sometimes maybe :)
\u201c@EmilyKathryn_17: Funny how you get what you want and pray for when you want the    same thing God wants.  :) #newheart #newdesires\u201d
@PhilKomarny Thank you :) can you follow me on Twitter so I can DM you?
RT @emrekavcoglu: @Usher dj got us a fallin in love and yeah earth number one m\u00fcsic    listen thank you king :-)
@

我想要的是摆脱所有@ +名称，如第一个：

@xirwinshemmo

并且只有文字＆＃34;感谢以下内容：）＆＃34;

还有一些我无法使用的链接：

https:\/\/t.co\/RCYFVrmdDG

也想要删除它。

希望有些人可能会有所帮助。

Answer 1

首先，我假设您正在逐行读取文件。所以我们可以先将每一行分成单个单词（字符串）：

for line in infile:
    words = line.split() # splits long string into array of single words

然后，我会遍历这些单词（仍然是上面for循环的一部分）

i = 0
for i in xrange(len(words)):
    if words[i].startswith('@'):
        print words[i+1:len(words)]

此代码仅打印用户名（@abc）后面的单词。

要删除http链接，您可以使用此if声明

if not words[i].startswith('http'):

Python悲伤/快乐的脸机器学习（摆脱文字）

1 个答案: