我有一个Log df,在该df中,我有列Description。看起来像。
Description
Machine x : Turn off
Another action here
Another action here
Machine y : Turn off
Machine x : Turn on
Another action here
我只需要用“:”分割行
赞:
Description Machine Action
Machine x : Turn off Machine x Turn off
Another action here
Another action here
Machine y : Turn off Machine y Turn off
Machine x : Turn on Machine x Turn on
Another action here
我已经尝试过:
s = df["Description"].apply(lambda x:x.split(":"))
df["Action"] = s.apply(lambda x: x[1])
df["Machine"] = s.apply(lambda x: x[0])
还有带有“ startswith”的内容。
答案 0 :(得分:2)
给出一个数据框
>>> df
Description
0 Machine x : Turn off
1 Another action here
2 Another action here
3 Machine y : Turn off
4 Machine x : Turn on
5 Another action here
我会通过Series.str.split(splitter, expand=True)
来解决这个问题。
>>> has_colon = df['Description'].str.contains(':')
>>> df[['Machine', 'Action']] = df.loc[has_colon, 'Description'].str.split('\s*:\s*', expand=True)
>>> df
Description Machine Action
0 Machine x : Turn off Machine x Turn off
1 Another action here NaN NaN
2 Another action here NaN NaN
3 Machine y : Turn off Machine y Turn off
4 Machine x : Turn on Machine x Turn on
5 Another action here NaN NaN
如果您希望使用空字符串,则可以通过以下方式替换NaN
单元格
>>> df.fillna('')
Description Machine Action
0 Machine x : Turn off Machine x Turn off
1 Another action here
2 Another action here
3 Machine y : Turn off Machine y Turn off
4 Machine x : Turn on Machine x Turn on
5 Another action here
答案 1 :(得分:2)
您可以将str.extract
与合适的regex
一起使用。这将找到:
周围的所有值(还会去除冒号周围的空格):
df[['Machine', 'Action']] = df.Description.str.extract('(.*) : (.*)',expand=True)
>>> df
Description Machine Action
0 Machine x : Turn off Machine x Turn off
1 Another action here NaN NaN
2 Another action here NaN NaN
3 Machine y : Turn off Machine y Turn off
4 Machine x : Turn on Machine x Turn on
5 Another action here NaN NaN
# df[['Machine', 'Action']] = df.Description.str.extract('(.*) : (.*)',expand=True).fillna('')
答案 2 :(得分:2)
只需将split
与expand=True
一起使用
df[['Machine', 'Action']] =df.Description.str.split(':',expand=True).dropna()
df
Description Machine Action
0 Machine x : Turn off Machine x Turn off
1 Another action here NaN NaN
2 Another action here NaN NaN
3 Machine y : Turn off Machine y Turn off
4 Machine x : Turn on Machine x Turn on
5 Another action here NaN NaN
答案 3 :(得分:1)
具有pd.Series.str.extract
函数和特定的正则表达式模式(在:
分隔符周围包含潜在的多个空格):
In [491]: df
Out[491]:
Description
0 Machine x : Turn off
1 Another action here
2 Another action here
3 Machine y : Turn off
4 Machine x : Turn on
5 Another action here
In [492]: pd.concat([df, df.Description.str.extract('(?P<Machine>[^:]+)\s+:\s+(?P<Action>[^:]+)').fillna('')], axis=1)
Out[492]:
Description Machine Action
0 Machine x : Turn off Machine x Turn off
1 Another action here
2 Another action here
3 Machine y : Turn off Machine y Turn off
4 Machine x : Turn on Machine x Turn on
5 Another action here
答案 4 :(得分:1)
StringMethods
有用且方便,但通常效果不佳。
我建议使用默认构造函数和纯python字符串处理
df[['Machine', 'Action']] = pd.DataFrame([x.split(':') for x in df.Description]).dropna()
计时优于.str
访问器选项。
df = pd.concat([df]*1000)
%timeit pd.DataFrame([x.split(':') for x in df.Description]).dropna()
4.47 ms ± 252 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df.Description.str.split(':',expand=True).dropna()
14.9 ms ± 323 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df.Description.str.extract('(.*) : (.*)',expand=True)
16.6 ms ± 393 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pd.concat([df, df.Description.str.extract('(?P<Machine>[^:]+)\s+:\s+(?P<Action>[^:]+)').fillna('')], axis=1)
22.5 ms ± 448 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
答案 5 :(得分:0)
我的主张是:
msk = df.Description.str.contains(':')
df[['Machine', 'Action']] = df.Description.str.split(':', 1, expand=True).where(msk, '')
首先创建一个掩码-行可以接收非空值。
然后仅对掩码为true的行执行实际替换。 其他行(实际上是所有新列)收到一个空字符串。