Question

我是Python的新手，并试图做一些事情来实现它。

在这样做时，我被困在这里。

我有.csv格式的数据，我使用

导入到python

data = pandas.read_csv("data.csv")
data.head()

   user  rating      id
0     1     3.5  1_1193
1     1     3.5   1_661
2     1     3.5   1_914
3     1     3.5  1_3408
4     1     3.5  1_2355

我需要的是来自'id'列我应该得到'_'之后的数字。

我尝试过的是：

data.id.split('_')

给了我错误： “'DataFrame'对象没有属性'split'”

因此，在从stackoverflow上的某个解决方案中读取后，我将'id'列设为np.array。

s1 = data.id.values
s2 = np.array2string(s1, separator=',',suppress_small=True)
s2.split('_')

这使我的输出为：

["['1",
 "1193','1",
 "661','1",
 "914',..., '6040",
 "161','6040",
 "2725','6040",
 "1784']"]
s2.split('_')[1]

给了我：

"1193','1"

我应该怎么做才能在“_”之后得到字符串？

Answer 1

您需要通过str[1]选择第二个列表进行矢量化str.split - 您也可以查看docs：

data['a'] = data.id.str.split('_').str[1]
print (data)
   user  rating      id     a
0     1     3.5  1_1193  1193
1     1     3.5   1_661   661
2     1     3.5   1_914   914
3     1     3.5  1_3408  3408
4     1     3.5  1_2355  2355

print (data.dtypes)
user        int64
rating    float64
id         object
a          object <- format is object (obviously string)
dtype: object

#split and cast column to int
data['a'] = data.id.str.split('_').str[1].astype(int)
print (data)
   user  rating      id     a
0     1     3.5  1_1193  1193
1     1     3.5   1_661   661
2     1     3.5   1_914   914
3     1     3.5  1_3408  3408
4     1     3.5  1_2355  2355

print (data.dtypes)
user        int64
rating    float64
id         object
a           int32 <- format is int
dtype: object

此外，如果需要用新值替换id列：

data.id = data.id.str.split('_').str[1]
print (data)
   user  rating    id
0     1     3.5  1193
1     1     3.5   661
2     1     3.5   914
3     1     3.5  3408
4     1     3.5  2355

data.id = data.id.str.split('_').str.get(1)
print (data)
   user  rating    id
0     1     3.5  1193
1     1     3.5   661
2     1     3.5   914
3     1     3.5  3408
4     1     3.5  2355

Answer 2

还有更多选择......

<强> 1
public class MyDataContext : AuditIdentityDbContext<ApplicationUser> { ... }

str.extract

<强> 2
df.id.str.extract('.*_(.*)', expand=False)

str.replace

两个收益

df.id.str.replace('.*_', '')

从pd.series格式python中的列拆分字符串

2 个答案: