对于搜索,我使用Tweepy流过滤器来跟踪要过滤的多个搜索词。流代码今天在工作时间运行了5个小时,并生成了80MB的结果文件。我加载了80MB的文件;创建R数据帧,然后发出grepl(“ yield”)在80MB数据帧(流文件)中搜索一个词。我为所有带有搜索词的列发出grepl():“ yield”;但是零(0)个数据帧具有0列和193510行。我也尝试了R dplyr select(包含)。从扭曲过滤器中找到零结果。
agfarm[,grepl("yield", colnames(agfarm$Value3))] #I tried all columns
agfarm %>% select(contains('yield'))
tweepy过滤器结果文件似乎无法成功找到并传送单个搜索词。搜索词(例如“营养产量”或“农作物”或“食品产量”)是否无效?还是tweepy过滤器找不到这样的术语? tweepy是否仅适用于#号标签:@,#?
my_stream_listener = PrintingStreamListener()
my_stream = tweepy.Stream(auth = api.auth, listener=my_stream_listener)
searchTermsFilter = '"soil yield" OR "nutrient yield" OR "managing crops" OR "food yield" OR "nutrient uptake" OR' \
'"high yielding crop" OR "fertilizer" OR "soil health" OR "crop yield" OR "acre yield" OR' \
'"nutrient management" OR "imbalance soil" OR "increase yield" OR "micronutrient" OR' \
'"sustainability" OR "corn yield" OR "farmers management practices"'
my_stream.filter(track=searchTermsFilter)