Question

我刚刚开始编程，为提出这个简单的问题道歉，但我被困住了。

我有一个名为s3的data.table：

S3：

ClaimID           dx      dxgroup
15nhbfcgcda       113.8   NA
15nhbfcgcda       156.8   NA
15nhbfcgcda       110.8   059
15nhbfcfssa       135.8   NA
15nhb4dfgda       V70.3   NA
15nhbf644da       118.8   042

S3有30000行。

我想应用这个逻辑：

If dxgroup = NA(
    If dx (fisrt 4 characters match with)= (2024, 2967, 9786,9788,8263)
        then dxgroup = (first 4 character of dx)
    else dx (fisrt 3 characters match with) = (V70, 042,897)
        then dxgroup = (first 3 character of dx)
else dxgroup = dx
)

结果应该是：

ClaimID           dx      dxgroup
15nhbfcgcda       113.8   113.8
15nhbfcgcda       156.8   156.8
15nhbfcgcda       110.8   059
15nhbfcfssa       135.8   135.8
15nhb4dfgda       V70.3   V70
15nhbf644da       118.8   042

请咨询？
我道歉：这是我第一次在这里问一些东西，所以还不习惯。所以我做了这样的事情（如果这是正确的我也没有，我也有错误）：
sample4＆lt; -sample3 [，dxgroup：= {if（dxgroup == NA）
- {if（substring（sample3 $ dx，1,4）== list（2501,2780,4151,5301,5751,6860,7807,7890,9898,9955,9970））substring（sample3 $ dx，1 ，4）
- else if（substring（sample3 $ dx，1,3）== list（042,493,682,850，V72））substring（sample3 $ dx，1,3）
- else if（substring（sample3 $ dx，1,4）== list（8540,8541））substring（sample3 $ dx，1,3）
- else if（substring（sample3 $ dx，1,3）== list（043,044））042
- else if（substring（sample3 $ dx，1,3）== list（789）＆amp; substring（sample3 $ dx，1,3）！= list（7891,7893,78930））7890
- else if（substring（sample3 $ dx，1,4）== list（7865）＆amp; substring（sample3 $ dx，1,4）！= list（78651,78652,78659））78650}
- else sample3 $ dx}] if（dxgroup == NA）{：缺少值需要TRUE / FALSE时出错另外：警告信息：在if（dxgroup == NA）{：条件的长度> 1，只使用第一个元素

Answer 1

你有逻辑全部设置。

请注意，对于data.table（以及几乎所有R），您可以将j包装在{curly brackets}中，括号中的最后一个语句将被分配。例如：

DT[,  dxgroup :=  { if (clause1)  
                     {if (foo) beebar else bar}
                  else chewybar
                  } 
  ]

Answer 2

这是一个更加data.table友好的解决方案：

library(data.table)
s3 <- data.table(s3)
s3[is.na(dxgroup) & (substring(ClaimID, 1, 3) %in% ("V70", "042", "897")), dxgroup := substring(dx, 1, 3)]
s3[is.na(dxgroup) & (substring(ClaimID, 1, 4) %in% ("2024", "2967", "9786", "9788", "8263")), dxgroup := substring(dx, 1, 4)]
s3[is.na(dxgroup), dxgroup := dx] #Default

基本上，您可以从最具体的条件工作到最全局的条件，因为上述脚本中的每一行都可能会覆盖上一行的匹配。

（我假设您使用的是data.table软件包。data.tables与data.frames不同，在我看来更好。）

如何更新data.table中的现有列值？

2 个答案: