我有一个R数据帧:
df <- data.frame("a"= c("123-wave-hi","234-boo-low","563-hi-dsa","897-op-ghhs"),
"b"= runif(4,2,10),
"c"= runif(4,5,20))
并希望将“a”拆分为“ - ”并将带有“数字”[[i]] [1]的段提取为新变量“idkey”。我能够通过(1)将'a'转换为字符(2),通过“ - ”(3)设置空向量并通过循环附加然后(4)cbinding数据帧来解决。如下所示:
df$a <- as.character(df$a)
df$split <- strsplit(df$a , "-")
idkey<- vector()
for (i in seq(nrow(df))) {
idkey[i]<- df$split[[i]][1]
}
df <- cbind(df,idkey)
是否有一种不那么笨拙的方法来实现这一结果?为什么不:
df$rownum <- 1:nrow(df)
df$id <- df$split[[df$rownum]][1]
工作?
下面是python的代码,它不是很麻烦,但是我仍然认为有一种方法可以在没有循环的情况下完成它吗?
import pandas as pd
import numpy as np
df = pd.DataFrame({"a":["123-wave-hi","234-boo-low","563-hi-dsa","897-op-ghhs"],
"b": range(2,6),
"c": range(7,11)})
df['idkey']=[entry.split('-')[0] for entry in df['a']]
答案 0 :(得分:3)
您是否只想从df$a
中提取数字?
df$idkey <- gsub("(\\d+).*", "\\1", df$a)
a b c idkey
1 123-wave-hi 6.050167 12.22999 123
2 234-boo-low 5.919546 17.62619 234
3 563-hi-dsa 7.193291 12.70553 563
4 897-op-ghhs 8.646451 12.94666 897
答案 1 :(得分:1)
<Style x:Key="ButtonStyle" TargetType="Button" >
<Setter Property="ContentTemplate" Value="{StaticResource Default}"/>
<Setter Property="Template">
<Setter.Value>
<ControlTemplate TargetType="Button">
<Grid Margin="20">
<ContentPresenter HorizontalAlignment="Center" VerticalAlignment="Center" />
</Grid>
</ControlTemplate>
</Setter.Value>
</Setter>
</Style>
...
<Button Style="{StaticResource ButtonStyle}">Click Me</Button>
如果在定义df$id <- sapply(strsplit(as.character(df$a), '-'), `[`, 1)
时包含额外选项(或将其全局设置为选项),则可以避免使用df
as.character
答案 2 :(得分:1)
在熊猫你可以做到。
import pandas as pd
import numpy as np
df = pd.DataFrame({"a":["123-wave-hi","234-boo-low","563-hi-dsa","897-op-ghhs"],
"b": range(2,6),
"c": range(7,11)})
df['idkey']= df['a'].str.split("-", expand=True)[0]
print( df )
<强>输出:强>
a b c idkey
0 123-wave-hi 2 7 123
1 234-boo-low 3 8 234
2 563-hi-dsa 4 9 563
3 897-op-ghhs 5 10 897