Python熊猫显示重复值

时间:2019-01-18 08:23:10

标签: python pandas dataframe

我正在尝试使用pandas.read_csv从txt文件中获取数据,但是它没有显示文件中重复的(相同)值,例如我在该行中有2043,但并非每行都显示一次。

我的文件样本

enter image description here

结果集

enter image description here

我绘制的所有圆圈也应该是2043,但是它们是空的。

我的代码是:

import pandas as pd

df= pd.read_csv('samplefile.txt', sep='\t', header=None,
               names = ["234",  "235",  "236"]

2 个答案:

答案 0 :(得分:2)

您得到MultiIndex,因此仅显示第一级值。

您可以通过reset_indexMultiIndex转换为列:

df = df.reset_index()

或在参数名称中指定每列以避免MultiIndex

df = pd.read_csv('samplefile.txt', sep='\t', names = ["one","two","next", "234", "235", "236"]

答案 1 :(得分:0)

昨天MultiIndex被我咬伤了,在浪费时间试图解决一个不存在的问题上浪费了时间。

如果您的索引级别之一是float64类型,则可能会发现索引 not 未完整显示。我有一个数据帧,我是df.groupby().describe(),我正在执行groupby()的变量原本是一个长int,在某个时候它已转换为float,并且在打印时这个指数是四舍五入的。有很多值彼此非常接近,因此在打印时{em> {em}出现 groupby()发现了多个第二级索引。

那不是很清楚,所以这是一个说明性示例...

import numpy as np
import pandas as pd

index = np.random.uniform(low=89908893132829,
                          high=89908893132929,
                          size=(50,))
df = pd.DataFrame({'obs': np.arange(100)},
                  index=np.append(index, index)).sort_index()
df.index.name = 'index1'
df['index2'] = [1, 2] * 50
df.reset_index(inplace=True)
df.set_index(['index1', 'index2'], inplace=True)

看一下数据框,看来只有index1的一级...

df.head(10)
                     obs
index1       index2     
8.990889e+13 1         4
             2        54
             1        61
             2        11
             1        89
             2        39
             1        65
             2        15
             1        60
             2        10

groupby(['index1', 'index2']).describe(),它看起来 好像只有index1 ...

一层
summary = df.groupby(['index1', 'index2']).describe()
summary.head()
                      obs                                        
                    count  mean std   min   25%   50%   75%   max
index1       index2                                              
8.990889e+13 1        1.0   4.0 NaN   4.0   4.0   4.0   4.0   4.0
             2        1.0  54.0 NaN  54.0  54.0  54.0  54.0  54.0
             1        1.0  61.0 NaN  61.0  61.0  61.0  61.0  61.0
             2        1.0  11.0 NaN  11.0  11.0  11.0  11.0  11.0
             1        1.0  89.0 NaN  89.0  89.0  89.0  89.0  89.0

但是,如果您同时查看index1的实际值,则会发现存在多个唯一值。在原始数据框中...

df.index.get_level_values('index1')

Float64Index([89908893132833.12, 89908893132833.12, 89908893132834.08,
              89908893132834.08, 89908893132835.05, 89908893132835.05,
               89908893132836.3,  89908893132836.3, 89908893132837.95,
              89908893132837.95,  89908893132838.1,  89908893132838.1,
               89908893132838.6,  89908893132838.6, 89908893132841.89,
              89908893132841.89, 89908893132841.95, 89908893132841.95,
              89908893132845.81, 89908893132845.81, 89908893132845.83,
              89908893132845.83, 89908893132845.88, 89908893132845.88,
              89908893132846.02, 89908893132846.02,  89908893132847.2,
               89908893132847.2, 89908893132847.67, 89908893132847.67,
               89908893132848.5,  89908893132848.5,  89908893132848.5,
               89908893132848.5, 89908893132855.17, 89908893132855.17,
              89908893132855.45, 89908893132855.45, 89908893132864.62,
              89908893132864.62, 89908893132868.61, 89908893132868.61,
              89908893132873.16, 89908893132873.16,  89908893132875.6,
               89908893132875.6, 89908893132875.83, 89908893132875.83,
              89908893132878.73, 89908893132878.73,  89908893132879.9,
               89908893132879.9, 89908893132880.67, 89908893132880.67,
              89908893132880.69, 89908893132880.69, 89908893132881.31,
              89908893132881.31, 89908893132881.69, 89908893132881.69,
              89908893132884.45, 89908893132884.45, 89908893132887.27,
              89908893132887.27, 89908893132887.83, 89908893132887.83,
               89908893132892.8,  89908893132892.8, 89908893132894.34,
              89908893132894.34,  89908893132894.5,  89908893132894.5,
              89908893132901.88, 89908893132901.88, 89908893132903.27,
              89908893132903.27, 89908893132904.53, 89908893132904.53,
              89908893132909.27, 89908893132909.27, 89908893132910.38,
              89908893132910.38, 89908893132911.86, 89908893132911.86,
               89908893132913.4,  89908893132913.4, 89908893132915.73,
              89908893132915.73, 89908893132916.06, 89908893132916.06,
              89908893132922.48, 89908893132922.48, 89908893132923.44,
              89908893132923.44, 89908893132924.66, 89908893132924.66,
              89908893132925.14, 89908893132925.14, 89908893132928.28,
              89908893132928.28],
             dtype='float64', name='index1')

...以及汇总的数据框中...

summary.index.get_level_values('index1')

Float64Index([89908893132833.12, 89908893132833.12, 89908893132834.08,
              89908893132834.08, 89908893132835.05, 89908893132835.05,
               89908893132836.3,  89908893132836.3, 89908893132837.95,
              89908893132837.95,  89908893132838.1,  89908893132838.1,
               89908893132838.6,  89908893132838.6, 89908893132841.89,
              89908893132841.89, 89908893132841.95, 89908893132841.95,
              89908893132845.81, 89908893132845.81, 89908893132845.83,
              89908893132845.83, 89908893132845.88, 89908893132845.88,
              89908893132846.02, 89908893132846.02,  89908893132847.2,
               89908893132847.2, 89908893132847.67, 89908893132847.67,
               89908893132848.5,  89908893132848.5, 89908893132855.17,
              89908893132855.17, 89908893132855.45, 89908893132855.45,
              89908893132864.62, 89908893132864.62, 89908893132868.61,
              89908893132868.61, 89908893132873.16, 89908893132873.16,
               89908893132875.6,  89908893132875.6, 89908893132875.83,
              89908893132875.83, 89908893132878.73, 89908893132878.73,
               89908893132879.9,  89908893132879.9, 89908893132880.67,
              89908893132880.67, 89908893132880.69, 89908893132880.69,
              89908893132881.31, 89908893132881.31, 89908893132881.69,
              89908893132881.69, 89908893132884.45, 89908893132884.45,
              89908893132887.27, 89908893132887.27, 89908893132887.83,
              89908893132887.83,  89908893132892.8,  89908893132892.8,
              89908893132894.34, 89908893132894.34,  89908893132894.5,
               89908893132894.5, 89908893132901.88, 89908893132901.88,
              89908893132903.27, 89908893132903.27, 89908893132904.53,
              89908893132904.53, 89908893132909.27, 89908893132909.27,
              89908893132910.38, 89908893132910.38, 89908893132911.86,
              89908893132911.86,  89908893132913.4,  89908893132913.4,
              89908893132915.73, 89908893132915.73, 89908893132916.06,
              89908893132916.06, 89908893132922.48, 89908893132922.48,
              89908893132923.44, 89908893132923.44, 89908893132924.66,
              89908893132924.66, 89908893132925.14, 89908893132925.14,
              89908893132928.28, 89908893132928.28],
             dtype='float64', name='index1')

我浪费时间ing头,想知道为什么我的groupby([ index1 , index2 )只产生index1的一个水平!