比较和替换DataFrames

时间:2015-07-08 22:25:09

标签: python python-3.x pandas dataframe

我有两个列表,一个是用作“密钥”的主列表,另一个是由于缺少信息而更新的列表。

main_df:

+---------+--------+--------+--------+--------+
| ID      | value1 | value2 | value3 | value4 |
+=========+========+========+========+========+
| 9845213 | 1      | 11     | a      | aa     |
+---------+--------+--------+--------+--------+
| 545167  | 2      | 22     | b      | bb     |
+---------+--------+--------+--------+--------+
| 132498  | 3      | 33     | c      | cc     |
+---------+--------+--------+--------+--------+
| 89465   | 4      | 44     | d      | dd     |
+---------+--------+--------+--------+--------+
| 871564  | 5      | 55     | e      | ee     |
+---------+--------+--------+--------+--------+
| 646879  | 6      | 66     | f      | ff     |
+---------+--------+--------+--------+--------+
...

data_df:

+----------+--------+--------+--------+--------+--------+
| ID       | value1 | value2 | value3 | value4 | value5 |
+==========+========+========+========+========+========+
| 4968712  | NaN    | NaN    | a      | aa     | a1     |
+----------+--------+--------+--------+--------+--------+
| 21347987 | 2      | 22     | b      | bb     | b2     |
+----------+--------+--------+--------+--------+--------+
| 4168512  | NaN    | NaN    | c      | cc     | c3     |
+----------+--------+--------+--------+--------+--------+
| 31468612 | 4      | 44     | d      | dd     | d4     |
+----------+--------+--------+--------+--------+--------+
| 9543213  | 5      | 55     | e      | ee     | e5     |
+----------+--------+--------+--------+--------+--------+
| 324798   | NaN    | NaN    | f      | ff     | f6     |
+----------+--------+--------+--------+--------+--------+

我要做的是使用value3中的value4main_df,以便仅更新{{1}中的values1values2 }}

Merge, join, and concatenate没有一个对我有用,因为我需要将这两个文件分开。

我尝试使用Working with missing datadata_df,但我不确定如何正确提取.replace()所需的值,以替换main_df中的值NaN }。

1 个答案:

答案 0 :(得分:2)

尝试使用update()函数的以下代码。

import numpy as np
import pandas as pd

main_df = pd.read_csv('/home/Jian/Downloads/main.txt', sep='|')
data_df = pd.read_csv('/home/Jian/Downloads/data.csv')

Out[229]: 
      ID      LAT     LONG                        CITY STATE                 TIME
0  12345      NaN      NaN           Cape Hinchinbrook    AK  2015-06-27 21:03:19
1  12346      NaN      NaN              Delenia Island    AK  2015-06-27 21:03:19
2  12347  29.7401 -95.4636                     Houston    TX  2015-06-27 21:03:19
3  12348  41.7132 -83.7032                    Sylvania    OH  2015-06-27 21:03:19
4  12349      NaN      NaN                  Alaskaland    AK  2015-06-27 21:03:19
5  12350      NaN      NaN  Badger Road Baptist Church    AK  2015-06-27 21:03:19

main_df_part = main_df[['PRIM_LAT_DEC', 'PRIM_LONG_DEC','FEATURE_NAME', 'STATE_ALPHA']]
main_df_part.columns = ['LAT', 'LONG', 'CITY', 'STATE']
main_df_part = main_df_part.set_index(['CITY', 'STATE'])

Out[230]: 
                                      LAT      LONG
CITY                       STATE                   
Pacific Ocean              CA     39.3103 -123.8447
Cape Hinchinbrook          AK     60.2347 -146.6417
Delenia Island             AK     60.3394 -148.1383
Alaskaland                 AK     64.8394 -147.7700
Badger Road Baptist Church AK     64.8167 -147.5661
Barnes Creek               AK     65.0014 -147.2939
Barnette Magnet School     AK     64.8383 -147.7300
Bentley Park               AK     64.8364 -147.6942

data_df = data_df.set_index(['CITY', 'STATE'])

Out[233]: 
                                     ID      LAT     LONG                 TIME
CITY                       STATE                                              
Cape Hinchinbrook          AK     12345      NaN      NaN  2015-06-27 21:03:19
Delenia Island             AK     12346      NaN      NaN  2015-06-27 21:03:19
Houston                    TX     12347  29.7401 -95.4636  2015-06-27 21:03:19
Sylvania                   OH     12348  41.7132 -83.7032  2015-06-27 21:03:19
Alaskaland                 AK     12349      NaN      NaN  2015-06-27 21:03:19
Badger Road Baptist Church AK     12350      NaN      NaN  2015-06-27 21:03:19


data_df.update(main_df_part)

Out[235]: 
                                     ID      LAT      LONG                 TIME
CITY                       STATE                                               
Cape Hinchinbrook          AK     12345  60.2347 -146.6417  2015-06-27 21:03:19
Delenia Island             AK     12346  60.3394 -148.1383  2015-06-27 21:03:19
Houston                    TX     12347  29.7401  -95.4636  2015-06-27 21:03:19
Sylvania                   OH     12348  41.7132  -83.7032  2015-06-27 21:03:19
Alaskaland                 AK     12349  64.8394 -147.7700  2015-06-27 21:03:19
Badger Road Baptist Church AK     12350  64.8167 -147.5661  2015-06-27 21:03:19