我有两个列表,一个是用作“密钥”的主列表,另一个是由于缺少信息而更新的列表。
main_df:
+---------+--------+--------+--------+--------+
| ID | value1 | value2 | value3 | value4 |
+=========+========+========+========+========+
| 9845213 | 1 | 11 | a | aa |
+---------+--------+--------+--------+--------+
| 545167 | 2 | 22 | b | bb |
+---------+--------+--------+--------+--------+
| 132498 | 3 | 33 | c | cc |
+---------+--------+--------+--------+--------+
| 89465 | 4 | 44 | d | dd |
+---------+--------+--------+--------+--------+
| 871564 | 5 | 55 | e | ee |
+---------+--------+--------+--------+--------+
| 646879 | 6 | 66 | f | ff |
+---------+--------+--------+--------+--------+
...
data_df:
+----------+--------+--------+--------+--------+--------+
| ID | value1 | value2 | value3 | value4 | value5 |
+==========+========+========+========+========+========+
| 4968712 | NaN | NaN | a | aa | a1 |
+----------+--------+--------+--------+--------+--------+
| 21347987 | 2 | 22 | b | bb | b2 |
+----------+--------+--------+--------+--------+--------+
| 4168512 | NaN | NaN | c | cc | c3 |
+----------+--------+--------+--------+--------+--------+
| 31468612 | 4 | 44 | d | dd | d4 |
+----------+--------+--------+--------+--------+--------+
| 9543213 | 5 | 55 | e | ee | e5 |
+----------+--------+--------+--------+--------+--------+
| 324798 | NaN | NaN | f | ff | f6 |
+----------+--------+--------+--------+--------+--------+
我要做的是使用value3
中的value4
和main_df
,以便仅更新{{1}中的values1
和values2
}}
Merge, join, and concatenate没有一个对我有用,因为我需要将这两个文件分开。
我尝试使用Working with missing data和data_df
,但我不确定如何正确提取.replace()
所需的值,以替换main_df
中的值NaN
}。
答案 0 :(得分:2)
尝试使用update()
函数的以下代码。
import numpy as np
import pandas as pd
main_df = pd.read_csv('/home/Jian/Downloads/main.txt', sep='|')
data_df = pd.read_csv('/home/Jian/Downloads/data.csv')
Out[229]:
ID LAT LONG CITY STATE TIME
0 12345 NaN NaN Cape Hinchinbrook AK 2015-06-27 21:03:19
1 12346 NaN NaN Delenia Island AK 2015-06-27 21:03:19
2 12347 29.7401 -95.4636 Houston TX 2015-06-27 21:03:19
3 12348 41.7132 -83.7032 Sylvania OH 2015-06-27 21:03:19
4 12349 NaN NaN Alaskaland AK 2015-06-27 21:03:19
5 12350 NaN NaN Badger Road Baptist Church AK 2015-06-27 21:03:19
main_df_part = main_df[['PRIM_LAT_DEC', 'PRIM_LONG_DEC','FEATURE_NAME', 'STATE_ALPHA']]
main_df_part.columns = ['LAT', 'LONG', 'CITY', 'STATE']
main_df_part = main_df_part.set_index(['CITY', 'STATE'])
Out[230]:
LAT LONG
CITY STATE
Pacific Ocean CA 39.3103 -123.8447
Cape Hinchinbrook AK 60.2347 -146.6417
Delenia Island AK 60.3394 -148.1383
Alaskaland AK 64.8394 -147.7700
Badger Road Baptist Church AK 64.8167 -147.5661
Barnes Creek AK 65.0014 -147.2939
Barnette Magnet School AK 64.8383 -147.7300
Bentley Park AK 64.8364 -147.6942
data_df = data_df.set_index(['CITY', 'STATE'])
Out[233]:
ID LAT LONG TIME
CITY STATE
Cape Hinchinbrook AK 12345 NaN NaN 2015-06-27 21:03:19
Delenia Island AK 12346 NaN NaN 2015-06-27 21:03:19
Houston TX 12347 29.7401 -95.4636 2015-06-27 21:03:19
Sylvania OH 12348 41.7132 -83.7032 2015-06-27 21:03:19
Alaskaland AK 12349 NaN NaN 2015-06-27 21:03:19
Badger Road Baptist Church AK 12350 NaN NaN 2015-06-27 21:03:19
data_df.update(main_df_part)
Out[235]:
ID LAT LONG TIME
CITY STATE
Cape Hinchinbrook AK 12345 60.2347 -146.6417 2015-06-27 21:03:19
Delenia Island AK 12346 60.3394 -148.1383 2015-06-27 21:03:19
Houston TX 12347 29.7401 -95.4636 2015-06-27 21:03:19
Sylvania OH 12348 41.7132 -83.7032 2015-06-27 21:03:19
Alaskaland AK 12349 64.8394 -147.7700 2015-06-27 21:03:19
Badger Road Baptist Church AK 12350 64.8167 -147.5661 2015-06-27 21:03:19