Question

我是python中的新手程序员。我试图用tweepy提取一系列推文的文本并将其保存到文本文件（我省略了身份验证和其他内容），我遇到了麻烦。

search = api.search("hello", count=10)

textlist=[]

for i in range(0,len(search)):
    textlist.append( search[i].text.replace('\n', '' ) )

f = open('temp.txt', 'w')
for i in range(0,len(idlist)):
    f.write(textlist[i].encode('utf-8') + '\n')

但是在一些长篇推文中，最后的文本被截断，每个字符串末尾出现一个三点字符“...”，所以有时我会丢失链接或主题标签。我怎么能避免这个？

Answer 1

使用tweepy，您可以使用Species <- c("dark frog",rep(c("elephant","tiger","boa"),3),"black mamba") Year <- c(rep(2011,4),rep(2012,3),rep(2013,4)) Abundance <- c(2,4,5,6,9,2,1,5,6,8,4) df <- data.frame(Species, Year, Abundance)获取全文（未在Tweepy文档中记录）。例如：

（未延期）

<?php $location = unserialize(file_get_contents('http://www.geoplugin.net/php.gp?ip='.$_SERVER['REMOTE_ADDR'])); print_r($location) ?>

@tousuncotefoot @equipedefrance @CreditAgricole @AntoGriezmann @KMbappe @layvinkurzawa @UmtitiSam J'ai jamais vue d ... https://tco/kALZ2ki9Vc

（扩展）

tweet_mode='extended'

@tousuncotefoot @equipedefrance @CreditAgricole @AntoGriezmann @KMbappe @layvinkurzawa @UmtitiSam J'ai jamais vue de match de foot et cela ferait un beau cadeau pour mon copain !!

Answer 2

当推文是转发的一部分时，会添加...（省略号）（因此会被截断）。这在documentation：

中提到

指示是否截断了text参数的值例如，转推超过140个字符的推文长度。截断的文本将以省略号结尾，如下所示......

没有办法避免这种情况，除非你拿每个推文，然后搜索它的任何转发并构建完整的时间线（显然这对于简单的搜索来说是不实际的，你可以这样做，如果你拿一个特定句柄的时间表）。

您还可以简化代码：

results = api.search('hello', count=10)

with open('temp.txt', 'w') as f:
   for tweet in results:
       f.write('{}\n'.format(tweet.decode('utf-8')))

Answer 3

这是转推的默认行为。您可以访问retweeted_status对象下的全文。

关于更改的Twitter API实体部分：

https://dev.twitter.com/overview/api/entities-in-twitter-objects#retweets

Twitter API文档（查找＆＃34;截断＆＃34;）

https://dev.twitter.com/overview/api/tweets

用tweepy保存推文的全文

3 个答案: