Pandas Unstack MemoryError - 有没有办法在大块中取出堆叠?

时间:2016-02-10 06:48:16

标签: python pandas chunking

我有一个形状为(699394, 3)的数据框。以下是一个小样本df

df = pd.DataFrame({'name': ['Bullet', 'Gauge', 'MFG Brand Name', 'Material', 'Number of Pieces', 'Product Depth (in.)', 'Product Height (in.)', 'Product Weight (lb.)', 'Product Width (in.)', 'Application Method', 'Assembled Depth (in.)', 'Assembled Height (in.)', 'Assembled Width (in.)', 'Bullet', 'Cleanup', 'Color Family', 'Color/Finish', 'Concrete Use', 'Container Size', 'Coverage Area (sq. ft.)', 'Deck Use', 'Interior/Exterior', 'MFG Brand Name', 'Mildew Resistant', 'Opacity', 'Paint Product Type', 'Patching & Repair Product Type', 'Product Style', 'RGB Value', 'Sealer', 'Time before recoating (hours)', 'Tintable', 'Transparency', 'UV Resistant', 'Waterproof', 'Bath Faucet Type', 'Built-in Water Filter', 'Bullet', 'Certifications and Listings', 'Color Family', 'Color/Finish', 'Connection size (in.)', 'Faucet Features', 'Faucet Included Components', 'Faucet type', 'Flow rate (gallons per minute)', 'Handle type', 'MFG Brand Name', 'Number of Faucet Handles', 'Number of Spray Settings', 'Number of showerheads', 'Product Depth (in.)', 'Product Height (in.)', 'Product Width (in.)', 'Showerhead face diameter (in.)', 'Showerhead type', 'Spray Pattern', 'Appliance Type', 'Assembled Depth (in.)', 'Assembled Height (in.)', 'Assembled Width (in.)', 'Bullet', 'Capacity of Microwave (cu. ft.)', 'Certifications and Listings', 'Color/Finish', 'Color/Finish Family', 'Cut-Out Front to Back Width (in.)', 'Cut-Out Height (in.)', 'Cut-Out Left to Right Length (in.)', 'Door Swing/Style', 'Exhaust Fan Speeds', 'Exhaust Maximum CFM', 'MFG Brand Name', 'Microwave Door Release', 'Microwave Features', 'Microwave Size', 'Number of One-Touch Settings', 'Number of Power Levels', 'Oven Settings', 'Product Depth (in.)', 'Product Height (in.)', 'Product Weight (lb.)', 'Product Width (in.)', 'Safety Listing', 'Sensor Cook', 'Turntable', 'Turntable Diameter', 'Vent Type', 'Wattage (watts)', 'Battery Power Type', 'Battery Size', 'Bulb Type Included', 'Bullet', 'Certifications and Listings', 'Commercial Light Type', 'Connection Type', 'ENERGY STAR Certified', 'Emergency run time (min.)', 'Fixture Color/Finish', 'Fixture Color/Finish Family'], 'product_uid': [100001.0, 100001.0, 100001.0, 100001.0, 100001.0, 100001.0, 100001.0, 100001.0, 100001.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100002.0, 100005.0, 100005.0, 100005.0, 100005.0, 100005.0, 100005.0, 100005.0, 100005.0, 100005.0, 100005.0, 100005.0, 100005.0, 100005.0, 100005.0, 100005.0, 100005.0, 100005.0, 100005.0, 100005.0, 100005.0, 100005.0, 100005.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100006.0, 100007.0, 100007.0, 100007.0, 100007.0, 100007.0, 100007.0, 100007.0, 100007.0, 100007.0, 100007.0, 100007.0], 'value': ['Versatile connector for various 90° connections and home repair projects. Stronger than angled nailing or screw fastening alone. Help ensure joints are consistently straight and strong. Dimensions: 3 in. x 3 in. x 1-1/2 in.. Made from 12-Gauge steel. Galvanized for extra corrosion resistance. Install with 10d common nails or #9 x 1-1/2 in. Strong-Drive SD screws', '12', 'Simpson Strong-Tie', 'Galvanized Steel', '1', '1.5', '3', '0.26', '3', 'Brush,Roller,Spray', '6.63 in', '7.76 in', '6.63 in', 'Revives wood and composite decks, railings, porches and boat docks, also great for concrete pool decks, patios and sidewalks. 100% acrylic solid color coating. Resists cracking and peeling and conceals splinters and cracks up to 1/4 in.. Provides a durable, mildew resistant finish. Covers up to 75 sq. ft. in 2 coats per gallon. Creates a textured, slip-resistant finish. For best results, prepare with the appropriate BEHR product for your wood or concrete surface. Actual paint colors may vary from on-screen and printer representations. Colors available to be tinted in most stores. Online Price includes Paint Care fee in the following states: CA, CO, CT, ME, MN, OR, RI, VT', 'Soap and Water', 'Browns / Tans', 'Tugboat', 'Yes', '1 GA-Gallon', '75', 'Yes', 'Exterior', 'BEHR Premium Textured DeckOver', 'Yes', 'Solid', 'Exterior Paint/Stain', 'Restoration Coating', 'Cottage', '119:100:086', 'No', '6', 'No', 'Solid', 'Yes', 'No', 'Combo Tub and Shower', 'No', 'Includes the trim kit only, the rough-in kit (R10000-UNBX) is sold separately. Includes the handle. Maintains a balanced pressure of hot and cold water even when a valve is turned on or off elsewhere in the system. Due to WaterSense regulations in the state of New York, please confirm your shipping zip code is not restricted from use of items that do not meet WaterSense qualifications', 'ADA Compliant,CSA Certified,IAPMO Certified', 'Chrome', 'Chrome', '1/2 In.', 'No Additional Features', 'Handles,Pressure Balance/Scald Guard', 'Bath Faucet', '2.5', 'Lever', 'Delta', 'Single Handle', '1', '1', '15.28', '24', '7.09', '4.06', 'Fixed Mount', 'Rain', 'Over the Range Microwave', '18.5 in', '17.13 in', '29.94 in', "Spacious 1.9 cu. ft. capacity accommodates dinner plates and casserole dishes with ease. 1100 watts of cooking power and 10 cooking levels make cooking and reheating a snap. 400 CFM venting system whisks smoke, steam and odors away from the cooktop to keep your kitchen air clear. Single piece door with built-in touch-activated control console streamlines the exterior for a sleek, modern look and easy cleanup. Cook with confidence with the Sensor and Programmed cooking cycles and options. Sensor cycles include: Steam/Simmer, AccuPop and Potato for fast prep of family favorites. Kids' Menu: it's simple, it's fast. The Kids' Menu is preset with cooking times and power levels for a variety of favorites like pizza and chicken nuggets. Now after school snacks don't have to be an afternoon hassle. TimeSavor Plus True Convection cooking uses a 1600-watt element and a fan to circulate heat over, under and around food for fast cooking and even browning. Industry leading CleanRelease non-stick interior requires no special cleaners. A damp cloth or sponge is all thatâ\x80\x99s needed to remove cooked-on spills and splashes. Recessed turntable's on/off feature is especially helpful when cooking with plates that are larger than the turntable. Automatic interior incandescent light and large window help you track cooking progress. 4-speed fan with Auto Vent Fan function. To keep the microwave oven from overheating, the vent fan will automatically turn on at high speed if the temperature from the range or cooktop below the microwave oven gets too hot. Replaceable charcoal and dishwasher safe mesh filters takes grease and other impurities out of the air. 90° hinge. With this innovative hinge design you can install this model next to a wall and still open the door easily. Limited 1-year warranty. Convertible venting. Can be installed as vented or non-vented (recirculating) to fit a variety of installation needs. AccuPop cycle senses the perfect pop every time. It adapts cooking time using a sound sensor that measures the time between pops so you don't have to worry about bag size or excessive unpopped kernels. Now you can finally watch the movie, not the microwave. Included items: convection rack, SureMist steamer and cooking rack. Included cooking rack lets you microwave on two levels, so you can cook several items at once", '1.9', '1-UL Listed', 'Stainless Steel', 'Stainless', '12', '17.13', '30', 'Right to Left Swing', '4', '400', 'Whirlpool', 'Pull', 'Charcoal Filter,Clock,Convection,Cooktop Lighting,Interior Light,Microwave Rack,Nightlight,One Touch Cooking,Removable Filter,Steam Cook,Timer,Turntable,Turntable On/Off Option', '30 in.', '6', '10', 'Defrost,Keep Warm,Sensor Cook', '18.5', '17.13', '67.1', '29.94', 'UL', 'Yes', 'Yes', '14', 'Convertible', '1100', 'Ni-Cad', '.Built-In', 'LED', 'Advanced LED technology is dependable and energy efficient. 2 adjustable heads allow you to direct light where it is needed. Engineering-grade thermoplastic housing is impact-resistant, scratch-resistant and corrosion-proof. Integrated LEDs means no bulbs are required. Typical life of the LEDs is 10 years of maintenance-free operation. Black housing has a compact low-profile design. Sealed, maintenance-free Ni-cad battery delivers 90 minute capacity to the LEDs. Dual voltage input capability (120 to 277-volt). Easily installs to wall or ceiling. UL damp-location listed', '1-UL Listed,OSHA Compliant', 'Exit and Emergency', 'Hardwired', 'No', '90', 'Black', 'Black']}, columns=['product_uid', 'name', 'value'])

我设置索引,然后取消堆叠name列:

df.set_index(['product_uid', 'name']).unstack('name')

这会产生以下内容(这正是我想要的):

                                value                      \
name                   Appliance Type  Application Method   
product_uid                                                 
100001                            NaN                 NaN   
100002                            NaN  Brush,Roller,Spray   
100005                            NaN                 NaN   
100006       Over the Range Microwave                 NaN   
100007                            NaN                 NaN   

                                                          \
name        Assembled Depth (in.) Assembled Height (in.)   
product_uid                                                
100001                        NaN                    NaN   
100002                    6.63 in                7.76 in   
100005                        NaN                    NaN   
100006                    18.5 in               17.13 in   
100007                        NaN                    NaN   

                                                                            \
name        Assembled Width (in.)      Bath Faucet Type Battery Power Type   
product_uid                                                                  
100001                        NaN                   NaN                NaN   
100002                    6.63 in                   NaN                NaN   
100005                        NaN  Combo Tub and Shower                NaN   
100006                   29.94 in                   NaN                NaN   
100007                        NaN                   NaN             Ni-Cad   

                                                                   \
name        Battery Size Built-in Water Filter Bulb Type Included   
product_uid                                                         
100001               NaN                   NaN                NaN   
100002               NaN                   NaN                NaN   
100005               NaN                    No                NaN   
100006               NaN                   NaN                NaN   
100007         .Built-In                   NaN                LED   

                  ...                                                    \
name              ...       Spray Pattern Time before recoating (hours)   
product_uid       ...                                                     
100001            ...                 NaN                           NaN   
100002            ...                 NaN                             6   
100005            ...                Rain                           NaN   
100006            ...                 NaN                           NaN   
100007            ...                 NaN                           NaN   

                                                                             \
name        Tintable Transparency Turntable Turntable Diameter UV Resistant   
product_uid                                                                   
100001           NaN          NaN       NaN                NaN          NaN   
100002            No        Solid       NaN                NaN          Yes   
100005           NaN          NaN       NaN                NaN          NaN   
100006           NaN          NaN       Yes                 14          NaN   
100007           NaN          NaN       NaN                NaN          NaN   


name           Vent Type Waterproof Wattage (watts)  
product_uid                                          
100001               NaN        NaN             NaN  
100002               NaN         No             NaN  
100005               NaN        NaN             NaN  
100006       Convertible        NaN            1100  
100007               NaN        NaN             NaN 

但是,当我在实际数据集上尝试此操作时,我会得到MemoryError

Traceback (most recent call last):
  File "<pyshell#172>", line 1, in <module>
    attr_df.set_index(['product_uid', 'name']).unstack('name')
  File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 3801, in unstack
    return unstack(self, level)
  File "C:\Python34\lib\site-packages\pandas\core\reshape.py", line 404, in unstack
    return _unstack_frame(obj, level)
  File "C:\Python34\lib\site-packages\pandas\core\reshape.py", line 445, in _unstack_frame
    return unstacker.get_result()
  File "C:\Python34\lib\site-packages\pandas\core\reshape.py", line 147, in get_result
    values, value_mask = self.get_new_values()
  File "C:\Python34\lib\site-packages\pandas\core\reshape.py", line 184, in get_new_values
    new_values = np.empty(result_shape, dtype=dtype)
MemoryError

我的问题是,有没有办法unstack块?大块拆散工作会不会有效?有没有其他方法来解决这个内存错误?我尝试了pivot_table但我的数据没有按数字数据汇总。

0 个答案:

没有答案