Question

我有一个这样的数据集，其中缺少数年的数据。

County Year Pop
12     1999 1.1
12     2001 1.2
13     1999 1.0
13     2000 1.1

我想要像

这样的东西

County Year Pop
12     1999 1.1
12     2000 NaN
12     2001 1.2
13     1999 1.0
13     2000 1.1
13     2001 nan

我已经尝试将index设置为year，然后使用reindex与另一个只有几年的数据框架（这里提到Pandas: Add data for missing months）但是它给了我错误cant reindex重复值。我也试过df.loc，但它有同样的问题。我甚至尝试了一个完整的外部连接，只有几年的空白df，但这也没有用。

我该如何解决这个问题？

Answer 1

制作一个MultiIndex，这样你就不会有重复：

df.set_index(['County', 'Year'], inplace=True)

然后使用所有组合构建一个完整的MultiIndex：

index = pd.MultiIndex.from_product(df.index.levels)

然后重新索引：

df.reindex(index)

MultiIndex的构造尚未经过测试，可能需要稍微调整一下（例如，如果所有县完全没有一年），但我认为你明白这一点。

Answer 2

我假设你可能想要在最小年和最大年之间添加所有年份。对于两个县12和13，您可能会错过2000。

我将pd.MultiIndex from_product使用unique列中的'County'值构建'Year' mux = pd.MultiIndex.from_product([ df.County.unique(), range(df.Year.min(), df.Year.max() + 1) ], names=['County', 'Year']) df.set_index(['County', 'Year']).reindex(mux).reset_index() County Year Pop 0 12 1999 1.1 1 12 2000 NaN 2 12 2001 1.2 3 13 1999 1.0 4 13 2000 1.1 5 13 2001 NaN，以及{中的最小和最大年份之间的所有整数年份{1}}栏。

注意： 此解决方案即使不存在，也会填写所有缺失的年份。

and '[search term]' != 'none'

Answer 3

您可以使用pivot_table：

windbgfb@microsoft.com

和stack结果（系列是必需的）：

    JSONObject jObject = new JSONObject(response);
    List<String> names = new ArrayList<>();
    JSONArray p = jObject.getJSONArray("SizeOptions");
    for (int i = 0; i < p.length(); i++) {
        JSONObject jo = p.getJSONObject(i);
        String name = jo.getString("Name");
        names.add(name);
    }

    System.out.println(names);

Answer 4

或者你可以尝试一些黑魔法：P

    buildscript {
    repositories {
        mavenCentral()
    }

    dependencies {
        classpath 'me.tatarka:gradle-retrolambda:2.5.0'
    }
}

repositories {
    mavenCentral()
    maven { url "https://github.com/alter-ego/advanced-android-logger/raw/develop/releases/" }
}

apply plugin: 'retrolambda'
apply plugin: 'com.android.application'

android {
    compileSdkVersion 23
    buildToolsVersion "25.0.0"

    defaultConfig {
        applicationId "com.packtpub.apps.rxjava_essentials"
        minSdkVersion 16
        targetSdkVersion 22
        versionCode 1
        versionName "1.0"
        jackOptions {
            enabled true
        }
    }

    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android.txt'), 'proguard-rules.pro'
        }
    }

    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }

    lintOptions {
        disable 'InvalidPackage'
        abortOnError false
    }

    packagingOptions {
        exclude 'META-INF/services/javax.annotation.processing.Processor'
    }
}

dependencies {
    compile fileTree(dir: 'libs', include: ['*.jar'])
    compile 'com.android.support:support-v4:23.1.1'
    compile "com.android.support:appcompat-v7:23.1.1"
    compile 'com.android.support:recyclerview-v7:23.1.1'
    compile 'com.android.support:cardview-v7:23.1.1'

    compile 'com.jakewharton.timber:timber:4.1.0'

    compile 'org.projectlombok:lombok:1.14.8'
    compile 'com.jakewharton:butterknife:6.0.0'

    compile 'io.reactivex:rxandroid:1.1.0'
    compile 'io.reactivex:rxjava:1.1.0'
    compile 'io.reactivex:rxjava-joins:0.22.0'

    compile 'com.google.guava:guava:18.0'
    compile 'com.google.code.gson:gson:2.4'

    compile 'com.github.lzyzsd:circleprogress:1.1.0@aar'
    compile 'com.github.rey5137:material:1.0.0'

    compile 'com.squareup.retrofit:retrofit:1.9.0'
    compile 'com.squareup.okhttp:okhttp-urlconnection:2.0.0'
    compile 'com.squareup.okhttp:okhttp:2.0.0'

    compile 'com.nostra13.universalimageloader:universal-image-loader:1.9.3'

}

Answer 5

你提到你已尝试加入一个空白的df，这种方法实际上可行。

<强>设定：

df = pd.DataFrame({'County': {0: 12, 1: 12, 2: 13, 3: 13},
 'Pop': {0: 1.1, 1: 1.2, 2: 1.0, 3: 1.1},
 'Year': {0: 1999, 1: 2001, 2: 1999, 3: 2000}})

<强>解决方案

#create a new blank df with all the required Years for each County
df_2 = pd.DataFrame(np.r_[pd.tools.util.cartesian_product([df.County.unique(),np.arange(1999,2002)])].T, columns=['County','Year'])

#Left join the new dataframe to the existing dataframe to populate the Pop values.
pd.merge(df_2,df,on=['Year','County'],how='left')
Out[73]: 
   County  Year  Pop
0      12  1999  1.1
1      12  2000  NaN
2      12  2001  1.2
3      13  1999  1.0
4      13  2000  1.1
5      13  2001  NaN

Pandas-在重复年份的时间序列数据中添加缺失的年份

5 个答案: