我有DataFrame
看起来像这样
df
A B C
0 2 5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A
我想生成以下内容
A B C Offset
0 2 5A5A5A5A 0
0 2 5A5A5A5A 1
0 2 5A5A5A5A 2
0 2 5A5A5A5A 3
当应用于数百万行时,这是我的不可扩展且缓慢的解决方案:
def splitequal(my_str):
splits = [my_str[x:x+8] for x in range(0,len(my_str),8)]
return splits
def tondata(row):
offset = row['Offset']
return row['Splits'][offset]
d = {'A': [0],
'B': [2],
'C': ["5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A"]}
df = pd.DataFrame(d,columns=['A','B','C'])
#Replicate the row 4 times
df2 = pd.DataFrame(np.repeat(df.as_matrix(),4,0),columns=['A','B','C'])
# Create the offset column to create 4 substrings
df2['Offset'] = df2.reset_index()['index'] % 4
#Split the string and create an array of 4 strings
df2['Splits'] = df2['C'].apply(splitequal)
#assign each substrings in the array to the 4 different offsets
df2['C'] = df2.apply(tondata,axis=1)
del(df2['Splits'])
print df2
A B C Offset
0 0 2 5A5A5A5A 0
1 0 2 5A5A5A5A 1
2 0 2 5A5A5A5A 2
3 0 2 5A5A5A5A 3
有更快的方法吗?
答案 0 :(得分:0)
您可以尝试以下方法:
# Get unique index on the data frame
df = df.reset_index()
# Slice the column, concatenate the results together and rename the columns
splitted = pd.concat([
df["C"].str.slice(i * 8, (i + 1) * 8) for i in range(4)
], axis=1)
splitted.columns = [0, 1, 2, 3]
# Unstack to get a single column with offsets as first index level
unstacked = splitted.unstack()
# Make the new index level an ordinary column
with_offset_col = unstacked.reset_index(level=0)
# Merge this together with the original frame again
pd.merge(df, with_offset_col, left_index=True, right_index=True)
此代码在我的机器上以4.1s执行。