当我将它读入包含具有大文本值的review_text列的sas时,我遇到了一个大数据集的问题。第一列review_id列出了观察值,而不是实际的ID。其他列的值有误,并在其他变量中被替换。
DATA review;
INFORMAT review_id $50. ;
INFORMAT review_text $5000. ;
INFORMAT business_name $100. ;
INFORMAT business_id $100. ;
INFORMAT review_date mmddyy10. ;
INFORMAT city $25. ;
INFORMAT state $20. ;
INFORMAT address $250. ;
INFORMAT user_id $100. ;
INFORMAT user_name $100. ;
INFORMAT friends $500. ;
INFORMAT yelping_since mmddyy10. ;
INFORMAT categories $100. ;
INFILE 'C:\users\scott\desktop\yelp_food_reviews.csv' DELIMITER= ',' dsd LRECL=32767 FIRSTOBS=2;
INPUT review_id $ review_text $ business_name $ business_id $ review_date review_ratingbusiness_rating num_biz_reviews city $ state $ address $ postal_code 8. latitude 8.10 longitude 8.10 mon_hours $ tues_hours $ wed_hours $ thurs_hours $ fri_hours $ sat_hours $ sun_hours $ user_id $ user_name $ user_reviews_given 8. ave_rating_given 4.1 friends $ yelping_since categories $ is_open 1.;
run;