根据数据框的其他列创建一个新的pandas数据框列

时间:2019-12-04 08:11:54

标签: python pandas dataframe

我有一个包含两列的数据框:

  • 'String'-> numpy数组,例如[47,0,49,12,46]

  • '是等距图'-> 1或0

    String              Is Isogram
0   [47, 0, 49, 12, 46] 1
1   [43, 50, 22, 1, 13] 1
2   [10, 1, 24, 22, 16] 1
3   [2, 24, 3, 24, 51]  0
4   [40, 1, 41, 18, 3]  1

我想创建另一列,并将'Is Isogram'值附加在'String'数组中,如下所示:

    String              Is Isogram  IsoString
0   [47, 0, 49, 12, 46] 1           [47, 0, 49, 12, 46, 1]
1   [43, 50, 22, 1, 13] 1           [43, 50, 22, 1, 13, 1]
2   [10, 1, 24, 22, 16] 1           [10, 1, 24, 22, 16, 1]
3   [2, 24, 3, 24, 51]  0           [2, 24, 3, 24, 51, 0]
4   [40, 1, 41, 18, 3]  1           [40, 1, 41, 18, 3, 1]

我尝试将apply函数与lambda一起使用:

df[''IsoString] = df.apply(lambda x: np.append(x['String'].values, x['Is Isogram'].values, axis=1))

但是它抛出了我不太了解的KeyError

KeyError: ('String', 'occurred at index String')

我该如何解决这个问题?

1 个答案:

答案 0 :(得分:3)

存在问题# Install pacakges if they are not already installed: necessary_packages <- c("bench") # Create a vector containing the names of any packages needing installation: new_packages <- necessary_packages[!(necessary_packages %in% installed.packages()[,"Package"])] # If the vector has more than 0 values, install the new pacakges # (and their) associated dependencies: if(length(new_packages) > 0){ install.packages(new_packages, dependencies = TRUE) } # Initialise the packages in the session: lapply(necessary_packages, require, character.only = TRUE) # Benchmark the solutions: function_performance <- bench::mark( # Solution 1, string split, unlist, coerce to numeric, subset out NAs, coerce to numeric, # format with scientific notation (type coercion to string): format(as.numeric(na.omit(as.numeric(unlist(strsplit(X, "="))))), scientific = TRUE), # Solution 2, substitution of alphabetic characters coercion to numeric # format with scientific notation (type coercion to string) format(as.numeric(gsub("[^0-9]+", "", X)), scientific = TRUE), # Solution 3, string split, digit extraction: grep("\\d+", unlist(strsplit(X, "=")), value = TRUE), check = FALSE) # Check the function performance: View(function_performance) axis=1代替了np.append函数:

.apply

如果df['IsoString'] = df.apply(lambda x: np.append(x['String'], x['Is Isogram']), axis=1) 中每个列表的长度相同,则使用numpy.hstack更好/更快:

String