Question

从数据框中，我想将（col1）第一个符号|之前的数字拆分为a列表，将其后的第二个数字拆分为b列表，并从（col1），（text1），（ text2），（text3）放入text列表

col1       | text1     | text2           | text3
1|6|Show   | us the    | straight way    | null
109|2|I    | worship   | not that        | which ye worship

我期望的输出

a = [1, 109] b = [6, 2] text = [‘Show us the straight way’, ‘I worship not that which ye worship’]

什么是最好的方法？

Answer 1

这很简单，假设col1始终具有3个管道分隔的元素。

a,b,C = zip(*df.col1.str.split('|'))
D =  df.drop('col1', 1).agg(lambda x: ' '.join(x.dropna()), axis=1)

c = [c + ' ' + d for c,d in zip(c,D)]

print(a)
('1', '109')

print(b)
('6', '2')

print(c)
['Show us the straight way', 'I worship not that which ye worship']

请注意，a和b是字符串的集合，您可以使用

将它们映射为数字

a, b = map(pd.to_numeric, (a,b))

...获取整数数组。

要处理col1具有任意数量的值的一般情况，您将需要

v = df.col1.str.split('|', expand=True)
m = v.applymap(str.isdigit)
a,b,*_ = v[m].T.agg(lambda x: x.dropna().tolist(), axis=1)

print(a)
['1', '109']

print(b)
['6', '2']

C的计算方法类似：

C = v[~m].agg(lambda x: x.dropna().str.cat(sep=' '), axis=1).tolist()

然后可以像以前一样计算小的c。

从数据框中拆分和合并字符串

1 个答案: