如何加载tsv数据并将样本数据集拆分为三列?

时间:2016-06-29 14:06:53

标签: python split whitespace multiple-columns

示例数据集:

120GB Hard Disk Drive with 3 Years Warranty for Lenovo Essential B570 Laptop Notebook HDD Computer - Certified 3 Years Warranty from Seifelden  3950    8
"TOSHIBA SATELLITE L305-S5919 LAPTOP LCD SCREEN 15.4"" WXGA CCFL SINGLE SUBSTITUTE REPLACEMENT LCD SCREEN ONLY. NOT A LAPTOP"   35099   324
Hobby-Ace Pixhawk PX4 RGB External LED Indicator USB Module for Pixhawk Flight Controller   21822   510
Pelicans mousepad   44629   260
P4648-60029 Hewlett-Packard Tc2100 System Board 42835   68
Ectaco EI900 SD Card English - Italian  249 6
Zippered Pocket Black School Laptop Tablet Dual Straps Deluxe Backpack  4342    172

这里我想分成三列

第1列为Product_id - 联想Essential B570笔记本电脑硬盘电脑提供3年保修120GB硬盘 - Seifelden认证3年保修

第2列为order_id 3950

第3列为item_id 8

同样我需要我的所有数据集

2 个答案:

答案 0 :(得分:1)

如果您不介意使用库,pandas可以读取csvs和tsvs。你想要

import pandas
df = pandas.read_csv('<your file>', sep='\t', names=['Product_id', 'order_id', 'item_id'])

如果你想使用vanilla python,它有点复杂,但this stackoverflow question有可能有用的代码片段。

答案 1 :(得分:1)

您可以使用csv模块来读取文件:

import csv
from pprint import pprint

columns = 'Product_id order_id item_8'.split()

with open('data.tsv', 'rb') as tsv_file:
    for row in csv.DictReader(tsv_file, fieldnames=columns, delimiter='\t'):
        pprint(row)

输出:

{'Product_id': '120GB Hard Disk Drive with 3 Years Warranty for Lenovo Essential B570 Laptop Notebook HDD Computer - Certified 3 Years Warranty from Seifelden',
 'item_8': '8',
 'order_id': '3950'}
{'Product_id': 'TOSHIBA SATELLITE L305-S5919 LAPTOP LCD SCREEN 15.4" WXGA CCFL SINGLE SUBSTITUTE REPLACEMENT LCD SCREEN ONLY. NOT A LAPTOP',
 'item_8': '324',
 'order_id': '35099'}
{'Product_id': 'Hobby-Ace Pixhawk PX4 RGB External LED Indicator USB Module for Pixhawk Flight Controller',
 'item_8': '510',
 'order_id': '21822'}
{'Product_id': 'Pelicans mousepad', 'item_8': '260', 'order_id': '44629'}
{'Product_id': 'P4648-60029 Hewlett-Packard Tc2100 System Board',
 'item_8': '68',
 'order_id': '42835'}
{'Product_id': 'Ectaco EI900 SD Card English - Italian',
 'item_8': '6',
 'order_id': '249'}
{'Product_id': 'Zippered Pocket Black School Laptop Tablet Dual Straps Deluxe Backpack',
 'item_8': '172',
 'order_id': '4342'}