我有一个包含两列的数据框:
'String'-> numpy数组,例如[47,0,49,12,46]
'是等距图'-> 1或0
String Is Isogram
0 [47, 0, 49, 12, 46] 1
1 [43, 50, 22, 1, 13] 1
2 [10, 1, 24, 22, 16] 1
3 [2, 24, 3, 24, 51] 0
4 [40, 1, 41, 18, 3] 1
我想创建另一列,并将'Is Isogram'值附加在'String'数组中,如下所示:
String Is Isogram IsoString
0 [47, 0, 49, 12, 46] 1 [47, 0, 49, 12, 46, 1]
1 [43, 50, 22, 1, 13] 1 [43, 50, 22, 1, 13, 1]
2 [10, 1, 24, 22, 16] 1 [10, 1, 24, 22, 16, 1]
3 [2, 24, 3, 24, 51] 0 [2, 24, 3, 24, 51, 0]
4 [40, 1, 41, 18, 3] 1 [40, 1, 41, 18, 3, 1]
我尝试将apply函数与lambda一起使用:
df[''IsoString] = df.apply(lambda x: np.append(x['String'].values, x['Is Isogram'].values, axis=1))
但是它抛出了我不太了解的KeyError
KeyError: ('String', 'occurred at index String')
我该如何解决这个问题?
答案 0 :(得分:3)
存在问题# Install pacakges if they are not already installed:
necessary_packages <- c("bench")
# Create a vector containing the names of any packages needing installation:
new_packages <- necessary_packages[!(necessary_packages %in% installed.packages()[,"Package"])]
# If the vector has more than 0 values, install the new pacakges
# (and their) associated dependencies:
if(length(new_packages) > 0){
install.packages(new_packages, dependencies = TRUE)
}
# Initialise the packages in the session:
lapply(necessary_packages, require, character.only = TRUE)
# Benchmark the solutions:
function_performance <- bench::mark(
# Solution 1, string split, unlist, coerce to numeric, subset out NAs, coerce to numeric,
# format with scientific notation (type coercion to string):
format(as.numeric(na.omit(as.numeric(unlist(strsplit(X, "="))))), scientific = TRUE),
# Solution 2, substitution of alphabetic characters coercion to numeric
# format with scientific notation (type coercion to string)
format(as.numeric(gsub("[^0-9]+", "", X)), scientific = TRUE),
# Solution 3, string split, digit extraction:
grep("\\d+", unlist(strsplit(X, "=")), value = TRUE),
check = FALSE)
# Check the function performance:
View(function_performance)
被axis=1
代替了np.append
函数:
.apply
如果df['IsoString'] = df.apply(lambda x: np.append(x['String'], x['Is Isogram']), axis=1)
中每个列表的长度相同,则使用numpy.hstack
更好/更快:
String