解析Raw Json Strings

时间:2016-11-01 19:38:34

标签: java json parsing twitter4j tweets

我使用Twitter4j来抓取数百万条推文,但令我惊讶的是,所有推文都以原始json格式存储。以下是格式化行的示例:

int main() {

//Call instance of the Spaceship model
spaceshipModel shipModel;

//Call instance of the Spaceship view
spaceshipView shipView;

//Create the texture of the spaceship from file
Texture spaceship;
spaceship.loadFromFile("spaceship.png");

//Create the window
RenderWindow window(VideoMode(800, 600), "Spaceship with MVC");

//Run the program as long as the window is open 
while (window.isOpen()) {

    //Check all the window's events that were triggered since the last iteration of the loop 
    Event event;

    while (window.pollEvent(event)) {

        //"Close requested" event: we close the window 
        switch (event.type) {

        //Window closed by pressing the X
        case Event::Closed:     
            window.close();
            break;

        //Checking for key pressed event
        case Event::KeyPressed:

            //Pressing esc to close the window
            if (event.key.code == Keyboard::Escape) {
                window.close();
            }
            break;      

        //We don't process other types of events
        default:             
            break;
        }

        //Clear screen with white BG
        window.clear(Color::White);

        //TESTING THE SETTING OF THE POSITION
        std::cout << shipModel.getPosition().x << ", " << shipModel.getPosition().y << std::endl;
        shipModel.setPosition(100, 100);
        std::cout << shipModel.getPosition().x << ", " << shipModel.getPosition().y << std::endl;

        //Set and draw the image
        shipView.setImage(&spaceship);
        shipView.drawImage(&window);

    }
}

return 0;

}

我在解析这些JSON字符串时遇到了很多问题。第一个问题是一些推文有不止一行。我通过连接同一条推文的行来解决这个问题。然后,我必须将所有StatusJSONImpl{ createdAt=TueNov0119: 00: 04CET2016, id=793512948027326464, text='RT @DylanYamaha_: Et profitez vraiment des personnes qui sont près de vous, parce que sa arrive très très vite un malheur..', source='<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', isTruncated=false, inReplyToStatusId=-1, inReplyToUserId=-1, isFavorited=false, isRetweeted=false, favoriteCount=0, inReplyToScreenName='null', geoLocation=null, place=null, retweetCount=0, isPossiblySensitive=false, lang='fr', contributorsIDs=[ ], retweetedStatus=StatusJSONImpl{ createdAt=TueNov0118: 38: 05CET2016, id=793507418244313088, text='Et profitez vraiment des personnes qui sont près de vous, parce que sa arrive très très vite un malheur..', source='<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', isTruncated=false, inReplyToStatusId=-1, inReplyToUserId=-1, isFavorited=false, isRetweeted=false, favoriteCount=56, inReplyToScreenName='null', geoLocation=null, place=null, retweetCount=175, isPossiblySensitive=false, lang='fr', contributorsIDs=[ ], retweetedStatus=null, userMentionEntities=[ ], urlEntities=[ ], hashtagEntities=[ ], mediaEntities=[ ], symbolEntities=[ ], currentUserRetweetId=-1, user=UserJSONImpl{ id=2242998313, name='_', screenName='DylanYamaha_', location='Marseille, France', description='null', isContributorsEnabled=false, profileImageUrl='http://pbs.twimg.com/profile_images/793110090727424002/9bLOivem_normal.jpg', profileImageUrlHttps='https://pbs.twimg.com/profile_images/793110090727424002/9bLOivem_normal.jpg', isDefaultProfileImage=false, url='null', isProtected=false, followersCount=12357, status=null, profileBackgroundColor='ABB8C2', profileTextColor='333333', profileLinkColor='89C9FA', profileSidebarFillColor='DDEEF6', profileSidebarBorderColor='FFFFFF', profileUseBackgroundImage=false, isDefaultProfile=false, showAllInlineMedia=false, friendsCount=87, createdAt=ThuDec1223: 15: 28CET2013, favouritesCount=6007, utcOffset=3600, timeZone='Amsterdam', profileBackgroundImageUrl='http://abs.twimg.com/images/themes/theme1/bg.png', profileBackgroundImageUrlHttps='https://abs.twimg.com/images/themes/theme1/bg.png', profileBackgroundTiled=false, lang='fr', statusesCount=2049, isGeoEnabled=false, isVerified=false, translator=false, listedCount=43, isFollowRequestSent=false, withheldInCountries=null }, withHeldInCountries=null, quotedStatusId=-1, quotedStatus=null }, userMentionEntities=[ UserMentionEntityJSONImpl{ name='_', screenName='DylanYamaha_', id=2242998313 } ], urlEntities=[ ], hashtagEntities=[ ], mediaEntities=[ ], symbolEntities=[ ], currentUserRetweetId=-1, user=UserJSONImpl{ id=393519159, name='Tiphaine.', screenName='LehmannTiphaine', location='France', description='Snapchat : tiphainelehmann | IG : tiphainelmn', isContributorsEnabled=false, profileImageUrl='http://pbs.twimg.com/profile_images/777174096958332928/yoz2aPp2_normal.jpg', profileImageUrlHttps='https://pbs.twimg.com/profile_images/777174096958332928/yoz2aPp2_normal.jpg', isDefaultProfileImage=false, url='null', isProtected=false, followersCount=145, status=null, profileBackgroundColor='000000', profileTextColor='333333', profileLinkColor='000000', profileSidebarFillColor='F3F3F3', profileSidebarBorderColor='000000', profileUseBackgroundImage=true, isDefaultProfile=false, showAllInlineMedia=false, friendsCount=200, createdAt=TueOct1819: 17: 04CEST2011, favouritesCount=3202, utcOffset=3600, timeZone='Paris', profileBackgroundImageUrl='http://pbs.twimg.com/profile_background_images/753348262/9d241c29a193586d5dc519838bded4c9.jpeg', profileBackgroundImageUrlHttps='https://pbs.twimg.com/profile_background_images/753348262/9d241c29a193586d5dc519838bded4c9.jpeg', profileBackgroundTiled=true, lang='fr', statusesCount=3462, isGeoEnabled=true, isVerified=false, translator=false, listedCount=3, isFollowRequestSent=false, withheldInCountries=null }, withHeldInCountries=null, quotedStatusId=-1, quotedStatus=null } StatusJSONImpl次出现替换为&#34;&#34;使用UserJSONImpl构造函数。 主要问题是如何获取所有推文属性。我用过:

JSONObject

但是每当推文文字包含像&#34; &#39; &#34;,我有:

  

twitter4j.JSONException:期待&#39;,&#39;或者&#39;}&#39;}在117 [字符118第1行]

即使文本不包含&#34;&#39;&#34;,我仍然无法提取推文的时间创建(createdAt)。所以我创建了一个String:

JSONObject jsonObj = new JSONObject(jsonline); //from twitter4j.JSONObject;

为了重新创建一个更干净的状态,并使用:

String statusFromRaw = "{\"filter_level\": \"low\",";
    statusFromRaw+= "\"retweeted\":"+jsonObj.get("isRetweeted")+",";
.....

但是在解析许多在大多数情况下为null的属性时我仍然遇到问题。 有什么建议吗?

1 个答案:

答案 0 :(得分:0)

不要使用Twitter4j。只需亲自点击Twitter的API,然后使用Jackson将其映射到对象。

你不应该试图解决IMO框架。