Question

我有一个包含两列的数据框：

'String'-> numpy数组，例如[47，0，49，12，46]
'是等距图'-> 1或0

    String              Is Isogram
0   [47, 0, 49, 12, 46] 1
1   [43, 50, 22, 1, 13] 1
2   [10, 1, 24, 22, 16] 1
3   [2, 24, 3, 24, 51]  0
4   [40, 1, 41, 18, 3]  1

我想创建另一列，并将'Is Isogram'值附加在'String'数组中，如下所示：

    String              Is Isogram  IsoString
0   [47, 0, 49, 12, 46] 1           [47, 0, 49, 12, 46, 1]
1   [43, 50, 22, 1, 13] 1           [43, 50, 22, 1, 13, 1]
2   [10, 1, 24, 22, 16] 1           [10, 1, 24, 22, 16, 1]
3   [2, 24, 3, 24, 51]  0           [2, 24, 3, 24, 51, 0]
4   [40, 1, 41, 18, 3]  1           [40, 1, 41, 18, 3, 1]

我尝试将apply函数与lambda一起使用：

df[''IsoString] = df.apply(lambda x: np.append(x['String'].values, x['Is Isogram'].values, axis=1))

但是它抛出了我不太了解的KeyError

KeyError: ('String', 'occurred at index String')

我该如何解决这个问题？

Answer 1

存在问题# Install pacakges if they are not already installed: necessary_packages <- c("bench") # Create a vector containing the names of any packages needing installation: new_packages <- necessary_packages[!(necessary_packages %in% installed.packages()[,"Package"])] # If the vector has more than 0 values, install the new pacakges # (and their) associated dependencies: if(length(new_packages) > 0){ install.packages(new_packages, dependencies = TRUE) } # Initialise the packages in the session: lapply(necessary_packages, require, character.only = TRUE) # Benchmark the solutions: function_performance <- bench::mark( # Solution 1, string split, unlist, coerce to numeric, subset out NAs, coerce to numeric, # format with scientific notation (type coercion to string): format(as.numeric(na.omit(as.numeric(unlist(strsplit(X, "="))))), scientific = TRUE), # Solution 2, substitution of alphabetic characters coercion to numeric # format with scientific notation (type coercion to string) format(as.numeric(gsub("[^0-9]+", "", X)), scientific = TRUE), # Solution 3, string split, digit extraction: grep("\\d+", unlist(strsplit(X, "=")), value = TRUE), check = FALSE) # Check the function performance: View(function_performance)被axis=1代替了np.append函数：

.apply

如果df['IsoString'] = df.apply(lambda x: np.append(x['String'], x['Is Isogram']), axis=1)中每个列表的长度相同，则使用numpy.hstack更好/更快：

String

根据数据框的其他列创建一个新的pandas数据框列

1 个答案: