为什么我的索引范围有此问题?为什么不起作用?

时间:2020-07-15 07:43:10

标签: python pandas numpy jupyter-notebook

尝试将我的一列拆分为几列时出现此错误。但是它只在一两列上拆分。如果您想在3、4、5列上拆分,它会写:

ValueError                                Traceback (most recent call last)
/usr/local/Cellar/jupyterlab/2.1.5/libexec/lib/python3.8/site-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
    349             try:
--> 350                 return self._range.index(new_key)
    351             except ValueError:

ValueError: 2 is not in range

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-19-d4e6a4d03e69> in <module>
     22 data_old[Col_1_Label] = newz[0]
     23 data_old[Col_2_Label] = newz[1]
---> 24 data_old[Col_3_Label] = newz[2]
     25 #data_old[Col_4_Label] = newz[3]
     26 #data_old[Col_5_Label] = newz[4]

/usr/local/Cellar/jupyterlab/2.1.5/libexec/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels > 1:
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

/usr/local/Cellar/jupyterlab/2.1.5/libexec/lib/python3.8/site-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
    350                 return self._range.index(new_key)
    351             except ValueError:
--> 352                 raise KeyError(key)
    353         return super().get_loc(key, method=method, tolerance=tolerance)
    354 

KeyError: 2

有我的代码,我有一个csv文件,当大熊猫读取它时-创建一列值'Контракт'。我将其拆分为另一列。但是它分成了两列。我想要7列!请帮助您理解此逻辑!

import pandas as pd
from pandas import Series, DataFrame
import re

dframe1 = pd.read_csv('po.csv')
columns = ['Контракт']
data_old = pd.read_csv('po.csv', header=None, names=columns)
data_old
# The thing you want to split the column on
SplitOn = ':'

# Name of Column you want to split
Split_Col = 'Контракт'



newz = data_old[Split_Col].str.split(pat=SplitOn, n=-1, expand=True)

# Column Labels (you can add more if you will have more)
Col_1_Label = 'Номер телефону'
Col_2_Label = 'Тарифний пакет'
Col_3_Label = 'Вихідні дзвінки з України за кордон'
Col_4_Label = 'ВАРТІСТЬ ПАКЕТА/ЩОМІСЯЧНА ПЛАТА'
Col_5_Label = 'ЗАМОВЛЕНІ ДОДАТКОВІ ПОСЛУГИ ЗА МЕЖАМИ ПАКЕТА'
Col_6_Label = 'Вартість послуги "Корпоративна мережа'
Col_7_Label = 'ЗАГАЛОМ ЗА КОНТРАКТОМ (БЕЗ ПДВ ТА ПФ)'
data_old[Col_1_Label] = newz[0]
data_old[Col_2_Label] = newz[1]
data_old[Col_3_Label] = newz[2]
#data_old[Col_4_Label] = newz[3]
#data_old[Col_5_Label] = newz[4]
#data_old[Col_6_Label] = newz[5]
#data_old[Col_7_Label] = newz[6]


data_old

1 个答案:

答案 0 :(得分:0)

Pandas不支持“非结构化文本”,您应该将其转换为标准格式或python对象,然后从中创建数据框

想象一下,您有一个名为from selenium.webdriver.support import ui, expected_conditions _WebDriverWait = ui.WebDriverWait _expected_conditions = expected_conditions 的文件:

data.txt

您可以像这样用Python加载进程:

Contract № 12345679 Number of phone: +7984563774
Total price for month : 00.00000
Total price: 10.0000

然后使用这些变量,您可以创建一个数据框

with open('data.txt') as f:
  content = list(data.readlines())

# First line contains the contract number and phone information
contract, phone = content[0].split(':')
# find contract number using regex
contract = re.findall('\d+', contract)[0]
# The phone is strightforward
phone = phone.strip()

# Second line and third line for prices
total_price = float(content[1].split(':')[1].strip())
total_month_price = float(content[2].split(':')[1].strip())

对所有文件重复相同的操作。