我在一个在线正则表达式测试程序中运行了我的数据下面的正则表达式代码,它运行正常。但是,当我尝试在Python 3和Pandas 0.18中运行它时,我会在新的' r'中获得NaN。列。
正则表达式代码是:
\(\(\d+,\s\d+\],\s\(\d+,\s(\d+)\]\)
示例数据是:
WT_g r_25_text r
Azmuth_25 Range_25
(0, 5] (0, 25] 1 ((0, 5], (0, 25]) NaN
(25, 30] (25, 50] 1 ((25, 30], (25, 50]) NaN
(35, 40] (25, 50] 1 ((35, 40], (25, 50]) NaN
(65, 70] (50, 75] 1 ((65, 70], (50, 75]) NaN
(85, 90] (50, 75] 1 ((85, 90], (50, 75]) NaN
(95, 100] (25, 50] 1 ((95, 100], (25, 50]) NaN
(100, 105] (50, 75] 1 ((100, 105], (50, 75]) NaN
(110, 115] (50, 75] 1 ((110, 115], (50, 75]) NaN
(115, 120] (0, 25] 1 ((115, 120], (0, 25]) NaN
我的代码:
df_25_sum['r'] = df_25_sum['r_25_text'].str.extract('\(\(\d+,\s\d+\],\s\(\d+,\s(\d+)\]\)')
df_25_sum
输出是上面的示例数据。当我根据提取添加新列时,我得到NaN。
答案 0 :(得分:0)
如果您实际上是在尝试从r_25_text
中提取最后一位数字(根据您的评论),则应遵循以下正则表达式模式:
pattern = r'(\d+)(?=(\]\)))' # find digits next to '])'
df_25_sum['r'] = df_25_sum['r_25_text'].str.extract(pattern)
df_25_sum
r
列的输出应该是列r_25_text
的每一行中的最后一个数值,即25, 50, 50, 75, 75
等。
请参阅regex link。
答案 1 :(得分:0)
我让这个工作。这与pylang开发的答案基本相同。但我无法使用正则表达式使用'='符号。我的最终代码和正则表达式是:
pattern = r'(\d+)?\]' # find digits next to ']'
df_25_sum['r'] = df_25_sum['r_25_text'].str.extract(pattern)
Azmuth_25 Range_25 WT_g r_25_text r
(0, 5] (0, 25] 1 (0, 25] 25
(25, 30] (25, 50] 1 (25, 50] 50
(35, 40] (25, 50] 1 (25, 50] 50
(65, 70] (50, 75] 1 (50, 75] 75
(85, 90] (50, 75] 1 (50, 75] 75
我只能假设Pandas 0.18不支持正则表达式中的'='。再次感谢pylang。
答案 2 :(得分:0)
你有没有尝试过:
import pandas as pd
df_25_sum = pd.DataFrame([
'((0, 5], (0, 25])',
'((25, 30], (25, 50])',
'((35, 40], (25, 50])'
], columns=['r_25_text'])
pattern = r'\(\(\d+,\s\d+\],\s+\(\d+,\s(\d+)\]\)'
df_25_sum['r'] = df_25_sum['r_25_text'].str.extract(pattern)
df_25_sum
>>>> r_25_text r
0 ((0, 5], (0, 25]) 25
1 ((25, 30], (25, 50]) 50
2 ((35, 40], (25, 50]) 50