确实很简单的问题:Python中是否有一个类似于R中的bpa软件包的软件包?
描述bpa功能的链接: Basic Pattern Analysis
我有一列包含混合数据,我想更好地理解数据的格式。 BPA提供以下格式(从我附加的链接中复制):
messy$Date %>%
get_pattern %>% # extract patterns
table %>% # tabulate frequencies
as.data.frame # display as a data frame
## . Freq
## 1 99/99/9999 262
## 2 9999-99-99 259
## 3 99Aaa9999 241
## 4 Aaaaaaaaaw99w9999 19
## 5 Aaaaaaaaw99w9999 56
## 6 Aaaaaaaw99w9999 45
## 7 Aaaaaaw99w9999 24
## 8 Aaaaaw99w9999 36
## 9 Aaaaw99w9999 42
## 10 Aaaw99w9999 16
答案 0 :(得分:0)
我使用python创建了一个类似于BPA get_pattern
函数的函数:
import re
def get_pattern(x, show_ws = True, ws_char = '<>'):
if pd.isnull(x):
x = np.nan
else:
if not isinstance(x, str):
x = str(x)
x = re.sub("[a-z]", "a", x)
x = re.sub("[A-Z]", "A", x)
x = re.sub("[0-9]", "9", x)
if isinstance(x, str):
x = re.sub("[a-z]", "a", x)
x = re.sub("[A-Z]", "A", x)
x = re.sub("[0-9]", "9", x)
if show_ws == True:
x = re.sub("\\s", ws_char, x)
return(x)