如何将返回的熊猫数据框强制为视图,以便执行转换?

时间:2019-03-09 15:07:02

标签: python pandas dataframe

我有一个名为merge的熊猫数据框,如下所示:

filepath                        timestamp  label_x  label_y X   Y   W   H
S6/N11/N11_R1/S6_N11_R1_IMAG0274    -----   empty   NaN NaN NaN NaN NaN
S6/N11/N11_R1/S6_N11_R1_IMAG0275    -----   empty   NaN NaN NaN NaN NaN
S6/N11/N11_R1/S6_N11_R1_IMAG0276    -----   empty   NaN NaN NaN NaN NaN
S6/N11/N11_R1/S6_N11_R1_IMAG0277    -----   empty   NaN NaN NaN NaN NaN

缺少一些时间戳,我想从图像元数据中获取时间戳(位置由filepath列指示)。如您所见,文件路径包含以名称S6开头的文件夹。这些文件夹应该从S1S6,但是目前我只文件夹S1S2。我想切出那些文件夹并执行转换:

import PIL.Image
def transformation(row):
    try:
        img=PIL.Image.open(path0/row["filepath"])
        row["timestamp"]=img._getexif()[306]
        return row
    except:
        return 
merge[ (merge["timestamp"]=='-----')& (merge["filepath"].str.startswith("S1") | merge["filepath"].str.startswith("S2")) ].apply(transformation, axis=1)

但是这是行不通的,因为切片操作从根本上返回了我一个副本:

>>>merge[(merge["timestamp"]=='-----')& (merge["filepath"].str.startswith("S1") | merge["filepath"].str.startswith("S2")) ]._is_view
>>>False

如何更改熊猫的行为以获取视野?

1 个答案:

答案 0 :(得分:0)

您可以应用函数并使用更新,但是您需要在函数中返回一个序列:

# sample df
# df = pd.read_clipboard()
# df.iloc[0:1, 1] = 'some value'

                           filepath   timestamp label_x  label_y   X   Y   W  \
0  S6/N11/N11_R1/S6_N11_R1_IMAG0274  some value   empty      NaN NaN NaN NaN   
1  S6/N11/N11_R1/S6_N11_R1_IMAG0275       -----   empty      NaN NaN NaN NaN   
2  S6/N11/N11_R1/S6_N11_R1_IMAG0276       -----   empty      NaN NaN NaN NaN   
3  S6/N11/N11_R1/S6_N11_R1_IMAG0277       -----   empty      NaN NaN NaN NaN   

    H  
0 NaN  
1 NaN  
2 NaN  
3 NaN  

现在将updateapplyloc一起使用

# your function
def myFunc(row):
    row['timestamp'] = 'some new value' # set new value to timestamp
    return row['timestamp'] # return a series

# use update and apply your function 
df['timestamp'].update(df.loc[2:3].apply(myFunc, axis=1))
# you would change df.loc[2:3] to your boolean
# df.loc[((df["timestamp"]=='-----') & (df['filepath'].str.startswith('S1') | df['filepath'].str.startswith('S2')))]

                           filepath       timestamp label_x  label_y   X   Y  \
0  S6/N11/N11_R1/S6_N11_R1_IMAG0274      some value   empty      NaN NaN NaN   
1  S6/N11/N11_R1/S6_N11_R1_IMAG0275           -----   empty      NaN NaN NaN   
2  S6/N11/N11_R1/S6_N11_R1_IMAG0276  some new value   empty      NaN NaN NaN   
3  S6/N11/N11_R1/S6_N11_R1_IMAG0277  some new value   empty      NaN NaN NaN   

    W   H  
0 NaN NaN  
1 NaN NaN  
2 NaN NaN  
3 NaN NaN