我在一个较大的Python函数中有一个Pandas部分代码,它引用了Models中的一些Django对象。大部分工作只是导入带有pandas的CSV,然后将其与新列(evidence_id)合并,该列将引用与字符串中相同的值。字符串值来自另一个Django表。我能够从我的Django模型中提取的unicode字符串中创建一个带有' evidence_id'的新Pandas Column。但它是一串长度为1的字符串,而CSV的其余部分则为@ 430,000 obs。 所以,当我带来新的  evidence_id'列只有第一行的字符串值(' 1')。我尝试了回填并转发填充和替换以获得' 1'重复' evidence_id'中的每一行柱。我不能硬编码' 1'虽然;作为  evidence_id'将改变。对于最终数据帧中的所有行,它始终是相同的数字。我打印了extract_properties [' evidence_number']和object' evidence_number'的对象类型,希望有助于辨别我的问题。任何想法都非常感谢...
'''
Processes extract from the extract, direct pulls from DJango models
tables.
'''
evidence_obj, created = Evidence.objects.get_or_create(case=case_obj,
evidence_number=extract_properties['evidence_number'])
evidence = pd.Series(extract_properties['evidence_number'])
print type(extract_properties['evidence_number'])
# <type 'unicode'>
print type(evidence)
# dtype: object
print str(evidence)
# 0 1
cols = ['mft_entry', 'sequence_nbr', 'parent_mft', 'parent_sequence_nbr', 'SI_mdate', 'SI_mtime', 'SI_adate', 'SI_atime', 'SI_cdate', 'SI_ctime', 'SI_bdate', 'SI_btime', 'FN_mdate', 'FN_mtime', 'FN_adate', 'FN_atime',\
'FN_cdate', 'FN_ctime', 'FN_bdate', 'FN_btime', 'typeof', 'extension', 'size', 'nname', 'ppath', 'symbolic_link', 'object_id', 'ads_metadata', 'time_warning', 'shortfilename_mdate', 'shortfilename_mtime',\
'shortfilename_adate', 'shortfilename_atime', 'shortfilename_cdate', 'shortfilename_ctime', 'shortfilename_bdate', 'shortfilename_btime', 'extracted_filepath']
frame = pd.read_csv('media/tmp/file.csv', delimiter='|', skiprows=6, names=cols, na_values=['', ' ', 'na'], encoding = 'latin-1', parse_dates=True, iterator=True, dayfirst=True, chunksize=1000)
df = pd.concat(frame)
df.fillna('Null')
df['evidence_id'] = evidence
df['evidence_id'].replace('NaN', evidence)
print df.head()
# shortfilename_bdate shortfilename_btime extracted_filepath \
# 0 2014-03-20 17:58:44.625 0x002aceb4 -> 0x002aceb6
# 1 2014-03-20 17:58:44.688 0x0009a9c6 -> 0x0009a9c8
# 2 NaN 0x00002ca5 -> 0x00002ca7
# 3 NaN 0x00000640 -> 0x00000642
# 4 NaN 0x002a9b9f -> 0x002a9ba0
# evidence_id
# 0 1
# 1 NaN
# 2 NaN
# 3 NaN
# 4 NaN
答案 0 :(得分:0)
import pandas as pd
import numpy as np
test = pd.DataFrame({'a':[1,np.nan,np.nan],'b':[1,5,2]})
test['a'] = test['a'].fillna(test['a'].ix[0])
所以在你的情况下,最后一块代码看起来像这样。
df = pd.concat(frame)
df['evidence_id'] = evidence
df['evidence_id'] = df['evidence_id'].fillna(df['evidence_id'].ix[0])