我使用Twitter4j来抓取数百万条推文,但令我惊讶的是,所有推文都以原始json格式存储。以下是格式化行的示例:
int main() {
//Call instance of the Spaceship model
spaceshipModel shipModel;
//Call instance of the Spaceship view
spaceshipView shipView;
//Create the texture of the spaceship from file
Texture spaceship;
spaceship.loadFromFile("spaceship.png");
//Create the window
RenderWindow window(VideoMode(800, 600), "Spaceship with MVC");
//Run the program as long as the window is open
while (window.isOpen()) {
//Check all the window's events that were triggered since the last iteration of the loop
Event event;
while (window.pollEvent(event)) {
//"Close requested" event: we close the window
switch (event.type) {
//Window closed by pressing the X
case Event::Closed:
window.close();
break;
//Checking for key pressed event
case Event::KeyPressed:
//Pressing esc to close the window
if (event.key.code == Keyboard::Escape) {
window.close();
}
break;
//We don't process other types of events
default:
break;
}
//Clear screen with white BG
window.clear(Color::White);
//TESTING THE SETTING OF THE POSITION
std::cout << shipModel.getPosition().x << ", " << shipModel.getPosition().y << std::endl;
shipModel.setPosition(100, 100);
std::cout << shipModel.getPosition().x << ", " << shipModel.getPosition().y << std::endl;
//Set and draw the image
shipView.setImage(&spaceship);
shipView.drawImage(&window);
}
}
return 0;
}
我在解析这些JSON字符串时遇到了很多问题。第一个问题是一些推文有不止一行。我通过连接同一条推文的行来解决这个问题。然后,我必须将所有StatusJSONImpl{
createdAt=TueNov0119: 00: 04CET2016,
id=793512948027326464,
text='RT @DylanYamaha_: Et profitez vraiment des personnes qui sont près de vous, parce que sa arrive très très vite un malheur..',
source='<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>',
isTruncated=false,
inReplyToStatusId=-1,
inReplyToUserId=-1,
isFavorited=false,
isRetweeted=false,
favoriteCount=0,
inReplyToScreenName='null',
geoLocation=null,
place=null,
retweetCount=0,
isPossiblySensitive=false,
lang='fr',
contributorsIDs=[
],
retweetedStatus=StatusJSONImpl{
createdAt=TueNov0118: 38: 05CET2016,
id=793507418244313088,
text='Et profitez vraiment des personnes qui sont près de vous, parce que sa arrive très très vite un malheur..',
source='<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>',
isTruncated=false,
inReplyToStatusId=-1,
inReplyToUserId=-1,
isFavorited=false,
isRetweeted=false,
favoriteCount=56,
inReplyToScreenName='null',
geoLocation=null,
place=null,
retweetCount=175,
isPossiblySensitive=false,
lang='fr',
contributorsIDs=[
],
retweetedStatus=null,
userMentionEntities=[
],
urlEntities=[
],
hashtagEntities=[
],
mediaEntities=[
],
symbolEntities=[
],
currentUserRetweetId=-1,
user=UserJSONImpl{
id=2242998313,
name='_',
screenName='DylanYamaha_',
location='Marseille, France',
description='null',
isContributorsEnabled=false,
profileImageUrl='http://pbs.twimg.com/profile_images/793110090727424002/9bLOivem_normal.jpg',
profileImageUrlHttps='https://pbs.twimg.com/profile_images/793110090727424002/9bLOivem_normal.jpg',
isDefaultProfileImage=false,
url='null',
isProtected=false,
followersCount=12357,
status=null,
profileBackgroundColor='ABB8C2',
profileTextColor='333333',
profileLinkColor='89C9FA',
profileSidebarFillColor='DDEEF6',
profileSidebarBorderColor='FFFFFF',
profileUseBackgroundImage=false,
isDefaultProfile=false,
showAllInlineMedia=false,
friendsCount=87,
createdAt=ThuDec1223: 15: 28CET2013,
favouritesCount=6007,
utcOffset=3600,
timeZone='Amsterdam',
profileBackgroundImageUrl='http://abs.twimg.com/images/themes/theme1/bg.png',
profileBackgroundImageUrlHttps='https://abs.twimg.com/images/themes/theme1/bg.png',
profileBackgroundTiled=false,
lang='fr',
statusesCount=2049,
isGeoEnabled=false,
isVerified=false,
translator=false,
listedCount=43,
isFollowRequestSent=false,
withheldInCountries=null
},
withHeldInCountries=null,
quotedStatusId=-1,
quotedStatus=null
},
userMentionEntities=[
UserMentionEntityJSONImpl{
name='_',
screenName='DylanYamaha_',
id=2242998313
}
],
urlEntities=[
],
hashtagEntities=[
],
mediaEntities=[
],
symbolEntities=[
],
currentUserRetweetId=-1,
user=UserJSONImpl{
id=393519159,
name='Tiphaine.',
screenName='LehmannTiphaine',
location='France',
description='Snapchat : tiphainelehmann | IG : tiphainelmn',
isContributorsEnabled=false,
profileImageUrl='http://pbs.twimg.com/profile_images/777174096958332928/yoz2aPp2_normal.jpg',
profileImageUrlHttps='https://pbs.twimg.com/profile_images/777174096958332928/yoz2aPp2_normal.jpg',
isDefaultProfileImage=false,
url='null',
isProtected=false,
followersCount=145,
status=null,
profileBackgroundColor='000000',
profileTextColor='333333',
profileLinkColor='000000',
profileSidebarFillColor='F3F3F3',
profileSidebarBorderColor='000000',
profileUseBackgroundImage=true,
isDefaultProfile=false,
showAllInlineMedia=false,
friendsCount=200,
createdAt=TueOct1819: 17: 04CEST2011,
favouritesCount=3202,
utcOffset=3600,
timeZone='Paris',
profileBackgroundImageUrl='http://pbs.twimg.com/profile_background_images/753348262/9d241c29a193586d5dc519838bded4c9.jpeg',
profileBackgroundImageUrlHttps='https://pbs.twimg.com/profile_background_images/753348262/9d241c29a193586d5dc519838bded4c9.jpeg',
profileBackgroundTiled=true,
lang='fr',
statusesCount=3462,
isGeoEnabled=true,
isVerified=false,
translator=false,
listedCount=3,
isFollowRequestSent=false,
withheldInCountries=null
},
withHeldInCountries=null,
quotedStatusId=-1,
quotedStatus=null
}
和StatusJSONImpl
次出现替换为&#34;&#34;使用UserJSONImpl
构造函数。
主要问题是如何获取所有推文属性。我用过:
JSONObject
但是每当推文文字包含像&#34; &#39; &#34;,我有:
twitter4j.JSONException:期待&#39;,&#39;或者&#39;}&#39;}在117 [字符118第1行]
即使文本不包含&#34;&#39;&#34;,我仍然无法提取推文的时间创建(createdAt)。所以我创建了一个String:
JSONObject jsonObj = new JSONObject(jsonline); //from twitter4j.JSONObject;
为了重新创建一个更干净的状态,并使用:
String statusFromRaw = "{\"filter_level\": \"low\",";
statusFromRaw+= "\"retweeted\":"+jsonObj.get("isRetweeted")+",";
.....
但是在解析许多在大多数情况下为null的属性时我仍然遇到问题。 有什么建议吗?
答案 0 :(得分:0)
不要使用Twitter4j。只需亲自点击Twitter的API,然后使用Jackson将其映射到对象。
你不应该试图解决IMO框架。