我有6个不同的csv培训数据文件,详情如下:
1 chefmozaccepts.csv
Instances: 1314
Attributes: 2
placeID: Nominal
Rpayment: Nominal, 12 [cash,VISA,MasterCard-Eurocard,American_Express,bank_debit_cards,checks,Discover,Carte_Blanche,Diners_Club,Visa,Japan_Credit_Bureau,gift_certificates]
%---
2 chefmozcuisine.csv
Instances: 916
Attributes: 2
placeID: Nominal
Rcuisine: Nominal, 59 [Afghan,African,American,Armenian,Asian,Bagels,Bakery,Bar,Bar_Pub_Brewery,Barbecue,Brazilian,Breakfast-Brunch,Burgers,Cafe-Coffee_Shop, Cafeteria,California,Caribbean,Chinese,Contemporary,Continental-European,Deli-Sandwiches,Dessert-Ice_Cream,Diner,Dutch-Belgian,Eastern_European,Ethiopian,Family,Fast_Food,Fine_Dining,French,,Game,German,Greek,Hot_Dogs, International,Italian,Japanese,Juice,Korean,Latin_American,Mediterranean,Mexican,Mongolian,Organic-Healthy,Persian, Pizzeria,Polish,Regional,Seafood,Soup,Southern,Southwestern,Spanish,Steaks,Sushi,Thai,Turkish,Vegetarian,Vietnamese]
%---
3 chefmozhours4.csv
Instances: 2339
Attributes: 3
placeID: Nominal
hours: Nominal, Range:00:00-23:30
days:Nominal, 7 [Mon;Tue;Wed;Thu;Fri;Sat;Sun]
%---
4 chefmozparking.csv
Instances: 702
Attributes: 2
placeID: Nominal
parking_lot:Nominal, 7[public,none,yes,valet_parking,free,street,validated_parking]
%---
5 geoplaces2.csv
Instances: 130
Attributes: 21
placeID: Nominal
latitude: Numeric
longitude: Numeric
the_geom_meter: Nominal (Geospatial)
name: Nominal
address: Nominal,Missing: 27
city: Nominal, Missing: 18
state: Nominal, Missing: 18
country: Nominal, Missing: 28
fax: Numeric, Missing: 130
zip: Nominal,Missing: 74
alcohol: Nominal, Values: 3 [No_Alcohol_Served,Wine_Beer,Full_Bar]
%---
6 rating_final.csv
Instances: 1161
Attributes: 5
userID: Nominal
placeID: Nominal
rating: Numeric, 3 [0,1,2]
food_rating: Numeric, 3 [0,1,2]
service_rating: Numeric, 3 [0,1,2]
%---
%---
7 usercuisine.csv
Instances: 330
Attributes: 2
userID: Nominal
Rcuisine: Nominal, 103
正如您所看到的,我有一个公共列PlaceID,但每个文件中的实例数量不同。
我需要将所有csv文件合并到一个最终的csv中,并将placeID作为唯一基础。但对于具有更多实例的文件,我想分割数据,以便最终所有列均匀填充,并且可以为实例不均匀的那些行复制剩余的元数据。
文件1:
placeID Rpayment
135110 cash
135110 VISA
135110 MasterCard-Eurocard
135110 American_Express
135110 bank_debit_cards
135109 cash
135107 cash
135107 VISA
135107 MasterCard-Eurocard
135107 American_Express
135107 bank_debit_cards
135106 cash
135106 VISA
135106 MasterCard-Eurocard
135105 cash
文件2
placeID Rcuisine
135110 Spanish
135109 Italian
135107 Latin_American
135106 Mexican
135105 Fast_Food
135104 Mexican
135103 Burgers
135103 Dessert-Ice_Cream
135103 Fast_Food
135103 Hot_Dogs
文件3
placeID hours days
135110 08:00-19:00; Mon;Tue;Wed;Thu;Fri;
135110 00:00-00:00; Sat;
135110 00:00-00:00; Sun;
135109 08:00-21:00; Mon;Tue;Wed;Thu;Fri;
135109 08:00-21:00; Sat;
135109 08:00-21:00; Sun;
135108 00:00-23:30; Mon;Tue;Wed;Thu;Fri;
档案4
placeID parking_lot
135110 public
135109 none
135108 none
135107 none
135106 none
135105 none
文件5
placeID latitude longitude name address city state country fax zip alcohol smoking_area dress_code accessibility price url Rambience franchise area other_services
135109 18.9217848 -99.2353499 Paniroles ? ? ? ? ? ? Wine-Beer not permitted informal no_accessibility medium ? quiet f closed Internet
135107 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none
135106 22.1497088 -100.9760928 El Rincón de San Francisco Universidad 169 San Luis Potosi San Luis Potosi Mexico ? 78000 Wine-Beer only at bar informal partially medium ? familiar f open none
placeID payment Cuisine parking_lot hours days latitude longitude name address city state country fax zip alcohol smoking_area dress_code accessibility price url ambience franchise area other_services
135110 cash Spanish public 08:00-19:00; Mon;Tue;Wed;Thu;Fri;
135110 VISA Spanish public 00:00-00:00; Sat;
135110 MasterCard-Eurocard Spanish public 00:00-00:00; Sun;
135110 American_Express Spanish public 08:00-19:00; Mon;Tue;Wed;Thu;Fri;
135110 bank_debit_cards Spanish public 00:00-00:00; Sat;
135110 bank_debit_cards Spanish public 00:00-00:00; Sun;
135109 cash Italian none 08:00-21:00; Mon;Tue;Wed;Thu;Fri; 18.9217848 -99.2353499 Paniroles ? ? ? ? ? ? Wine-Beer not permitted informal no_accessibility medium ? quiet f closed Internet
135109 cash Italian none 08:00-21:00; Sat; 18.9217848 -99.2353499 Paniroles ? ? ? ? ? ? Wine-Beer not permitted informal no_accessibility medium ? quiet f closed Internet
135109 cash Italian none 08:00-21:00; Sun; 18.9217848 -99.2353499 Paniroles ? ? ? ? ? ? Wine-Beer not permitted informal no_accessibility medium ? quiet f closed Internet
135107 cash Latin_American none 07:00-23:30; Mon;Tue;Wed;Thu;Fri; 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none
135107 VISA Latin_American none 07:00-23:30; Sat; 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none
135107 MasterCard-Eurocard Latin_American none 07:00-23:30; Sun; 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none
135107 American_Express Latin_American none 07:00-23:30; Mon;Tue;Wed;Thu;Fri; 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none
135107 bank_debit_cards Latin_American none 07:00-23:30; Sat; 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none
135107 MasterCard-Eurocard Latin_American none 07:00-23:30; Sun; 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none
135106 cash Mexican none 18:00-23:30; Mon;Tue;Wed;Thu;Fri; 22.1497088 -100.9760928 El Rincón de San Francisco Universidad 169 San Luis Potosi San Luis Potosi Mexico ? 78000 Wine-Beer only at bar informal partially medium ? familiar f open none
135106 VISA Mexican none 18:00-23:30; Sat; 22.1497088 -100.9760928 El Rincón de San Francisco Universidad 169 San Luis Potosi San Luis Potosi Mexico ? 78000 Wine-Beer only at bar informal partially medium ? familiar f open none
135106 MasterCard-Eurocard Mexican none 18:00-21:00; Sun; 22.1497088 -100.9760928 El Rincón de San Francisco Universidad 169 San Luis Potosi San Luis Potosi Mexico ? 78000 Wine-Beer only at bar informal partially medium ? familiar f open none
我知道这是一项繁琐的工作,但我们将不胜感激。我试图使用大熊猫。不是csvreader。
答案 0 :(得分:1)
尝试类似:
import pandas as pd
df_out = pd.read_csv('file1.csv')
for f in ('file2.csv','file3.csv','file4.csv','file4.csv','file5.csv'):
df_out = df_out.merge(pd.read_csv(f),how='inner',on='placeID')
df_out.to_csv('output.csv')