从元素列表到化学式

时间:2018-10-18 12:23:51

标签: r dataframe data-manipulation chemistry

我有一个元素组成的列表,每个元素在自己的行中。有时这些元素为零。

   C H N O S
1  5 5 0 0 0
2  6 4 1 0 1
3  4 6 2 1 0

我需要将它们结合起来,以便它们阅读,例如C5H5,C6H4NS,C4H6N2O。 这意味着对于任何值为“ 1”的元素,我都只应使用列名,对于任何值为0的元素,都应完全跳过该列。

我不太确定从哪里开始。我可以添加一个新列,以使其更易于阅读,例如

   c C h H n N o O s S
1  C 5 H 5 N 0 O 0 S 0
2  C 6 H 4 N 1 O 0 S 1
3  C 4 H 6 N 2 O 1 S 0

这样,我只需要输出为单个字符串,但我需要忽略任何零值,并在元素名称后删除一个。

5 个答案:

答案 0 :(得分:5)

这里是基本的R解决方案:

df = read.table(text = "
C H N O S
5 5 0 0 0
6 4 1 0 1
4 6 2 1 0
", header=T)

apply(df, 1, function(x){return(gsub('1', '', paste0(colnames(df)[x > 0], x[x > 0], collapse='')))})
[1] "C5H5"    "C6H4NS"  "C4H6N2O"

paste0(colnames(df)[x > 0], x[x > 0], collapse='')将行值大于零的列名称粘贴在一起。 gsub然后将其删除。 apply对数据帧中的每一行都执行此操作。

答案 1 :(得分:2)

这是一个tidyverse解决方案,需要进行一些重塑:

df = read.table(text = "
C H N O S
5 5 0 0 0
6 4 1 0 1
4 6 2 1 0
", header=T)

library(tidyverse)

df %>%
  mutate(id = row_number()) %>%                      # add row id
  gather(key, value, -id) %>%                        # reshape data
  filter(value != 0) %>%                             # remove any zero rows
  mutate(value = ifelse(value == 1, "", value)) %>%  # replace 1 with ""
  group_by(id) %>%                                   # for each row
  summarise(v = paste0(key, value, collapse = ""))   # create the string value

# # A tibble: 3 x 2
#      id v      
#   <int> <chr>  
# 1     1 C5H5   
# 2     2 C6H4NS 
# 3     3 C4H6N2O

答案 2 :(得分:2)

假定输入矩阵m如结尾处的注释中可重复提供,如果使用as.matrix将其转换为数据帧,则将其转换为矩阵。

现在仅用字母创建与m相同形状的矩阵,因此现在lets包含字母,m包含数字。然后将字母和数字粘贴在一起,并用空字符串替换数字为零的那些单元格。还要用字母替换数字为1的所有单元格。最后将每一行粘贴在一起。不使用任何程序包,也不使用任何循环或* apply。

lets <-  t(replace(t(m), TRUE, colnames(m)))
mm <- paste0(lets, m)
mm <- replace(mm, m == 0, "")
mm <- ifelse(m == 1, lets, mm)
do.call("paste0", as.data.frame(mm))
## [1] "C5H5"    "C6H4NS"  "C4H6N2O"

注意

可重复形式的输入矩阵m假定为:

m <- matrix(c(5, 6, 4, 5, 4, 6, 0, 1, 2, 0, 0, 1, 0, 1, 0), 3, 5,
  dimnames = list(NULL, c("C", "H", "N", "O", "S")))

答案 3 :(得分:1)

另一个避免使用// Everything has a beginning. array_result = array() // Browsing array1 For each item of array1 found_item = false // For each product of array1, seeking for the same product in array2 For each item2 of array2 If areTheSameProduct(item1, item2) If item1.price < item2.price Then appendToArray(array_result, item1) Else appendToArray(array_result, item2) End If found_item = true // Removing then the product in array2, to let at the end only // the ones which was't found in array1. // In PHP, use here unset() to remove an element from the array. // "key" parameter can be a named index or the classic integer one unset(array1[key of item2]) Break // No need to continue the loop on array2 End If End For // Item not found in array2? We'll keep the one of array1. If (Not found_item) Then appendToArray(array_result, item1) End For // For the remaining values of array_two (which were not in array1) For each item2 of array2 appendToArray(array_result, item2 ) End For // Comparator function Function areTheSameProduct(item1, item2) return (item1.supplier_name == item2.supplier_name) AND (item1.product_code == item2.product_code) AND (item1.product == item2.product) End Function 边距为1的想法,

apply

答案 4 :(得分:0)

另一个选择

library(dplyr)
#Get indices of all non-zero numbers in the dataframe
inds <- which(df!=0, arr.ind = TRUE)

#Create a dataframe with row index, column index and value at that position
vals <- data.frame(inds, val = df[inds])

#For each row paste the name of the column and value together and then replace 1
vals %>%
  group_by(row) %>%
  summarise(chemical = paste0(names(df)[col], val,collapse = "")) %>%
  mutate(chemical = gsub("[1]", "", chemical))

#   row chemical
#  <int> <chr>   
#1     1 C5H5    
#2     2 C6H4NS  
#3     3 C4H6N2O