如何根据索引值

时间:2018-06-05 14:09:17

标签: r python-3.x

我有一个R数据帧:

df <- data.frame("a"= c("123-wave-hi","234-boo-low","563-hi-dsa","897-op-ghhs"),
                 "b"= runif(4,2,10),
                 "c"= runif(4,5,20))

并希望将“a”拆分为“ - ”并将带有“数字”[[i]] [1]的段提取为新变量“idkey”。我能够通过(1)将'a'转换为字符(2),通过“ - ”(3)设置空向量并通过循环附加然后(4)cbinding数据帧来解决。如下所示:

df$a <- as.character(df$a)
df$split <- strsplit(df$a , "-")
idkey<- vector()
for (i in seq(nrow(df))) {
  idkey[i]<- df$split[[i]][1]
}

df <- cbind(df,idkey)

是否有一种不那么笨拙的方法来实现这一结果?为什么不:

df$rownum <- 1:nrow(df)
df$id <- df$split[[df$rownum]][1]

工作?

下面是python的代码,它不是很麻烦,但是我仍然认为有一种方法可以在没有循环的情况下完成它吗?

import pandas as pd
import numpy as np
df = pd.DataFrame({"a":["123-wave-hi","234-boo-low","563-hi-dsa","897-op-ghhs"],
                 "b": range(2,6),
                 "c": range(7,11)})


df['idkey']=[entry.split('-')[0] for entry in df['a']]

3 个答案:

答案 0 :(得分:3)

您是否只想从df$a中提取数字?

df$idkey <- gsub("(\\d+).*", "\\1", df$a)

            a        b        c idkey
1 123-wave-hi 6.050167 12.22999   123
2 234-boo-low 5.919546 17.62619   234
3  563-hi-dsa 7.193291 12.70553   563
4 897-op-ghhs 8.646451 12.94666   897

答案 1 :(得分:1)

<Style x:Key="ButtonStyle" TargetType="Button" >
    <Setter Property="ContentTemplate" Value="{StaticResource Default}"/>
    <Setter Property="Template">
        <Setter.Value>
            <ControlTemplate TargetType="Button">
                <Grid Margin="20">
                    <ContentPresenter HorizontalAlignment="Center" VerticalAlignment="Center"  />
                </Grid>
            </ControlTemplate>
        </Setter.Value>
    </Setter>
</Style>
...
<Button Style="{StaticResource ButtonStyle}">Click Me</Button>

如果在定义df$id <- sapply(strsplit(as.character(df$a), '-'), `[`, 1) 时包含额外选项(或将其全局设置为选项),则可以避免使用df

将列强制转换为字符
as.character

答案 2 :(得分:1)

在熊猫你可以做到。

import pandas as pd
import numpy as np
df = pd.DataFrame({"a":["123-wave-hi","234-boo-low","563-hi-dsa","897-op-ghhs"],
                 "b": range(2,6),
                 "c": range(7,11)})


df['idkey']=  df['a'].str.split("-",  expand=True)[0]
print( df )

<强>输出:

             a  b   c idkey
0  123-wave-hi  2   7   123
1  234-boo-low  3   8   234
2   563-hi-dsa  4   9   563
3  897-op-ghhs  5  10   897