基本上,我想将缩写“ L.L.C.”中的所有点都删除,转换为“ LLC”。我没有所有缩写的列表。我想将其转换为找到的内容。在句子标记化之前执行此步骤。
text = """
Proligo L.L.C. is a limited liability company.
S.A. is a place.
She works for AAA L.P. in somewhere.
"""
text = re.sub(r"(?:([A-Z])\.){2,}", "\1", text)
这不起作用。
我想从缩写词中删除点,以使点不会破坏句子标记器。
谢谢!
P.S。抱歉,不清楚。我编辑了示例文本。
答案 0 :(得分:1)
尝试对re.sub
使用回调函数:
def callback( str ):
return str.replace('.', '')
text = "L.L.C., S.A., L.P."
text = re.sub(r"(?:[A-Z]\.)+", lambda m: callback(m.group()), text)
print(text)
正则表达式模式(?:[A-Z]\.)+
将匹配任意数量的大写缩写。然后,对于每次匹配,回调函数都会删除点。
答案 1 :(得分:0)
import re
string = 'ha.f.d.s.a.s.d.f'
re.sub('\.', '', string)
#output
hafdsasdf
请注意,这仅在您的文本不包含多个句子的情况下才能正常工作。如果这样做,它将创建一个长句子作为全“。”。被替换。
答案 2 :(得分:0)
使用此正则表达式:
import java.awt.Image;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.List;
import javax.imageio.ImageIO;
public class Shoe {
public int shoePrice;
public int shipping;
public int tax;
public int subtotal;
public int totalRaffles;
public double review;
public String shoeName;
public String style;
public String typeOfShoes;
public String brand;
public String[] imageURLs;
public Image[] images;
public String description;
public String[] colors;
public String[] sizes;
public boolean isSold;
public Shoe(int shoePrice, int shipping, int tax, int subtotal, double review,
int totalRaffles,
String shoeName, String style, String typeOfShoes, String brand,
String[] imageURLs,
String description, String[] colors, String[] sizes,
boolean isSold) {
this.shoePrice = shoePrice;
this.shipping = shipping;
this.tax = tax;
this.subtotal = subtotal;
this.review = review;
this.totalRaffles = totalRaffles;
this.sizes = sizes;
this.shoeName = shoeName;
this.style = style;
this.typeOfShoes = typeOfShoes;
Image[] imagesFinal = new Image[imageURLs.length];
for (int i = 0; i < imageURLs.length; i++) {
URL url = null;
try {
url = new URL(imageURLs[i]);
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Image c = null;
try {
c = ImageIO.read(url);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
imagesFinal[i] = c;
}
this.images = imagesFinal;
this.description = description;
this.colors = colors;
this.isSold = isSold;
this.brand = brand;
}
public Shoe(String shoePrice2, String shipping2, String tax2, String subTotal2, String review2,
String totalRaffles2, String shoeName2, String style2, String typeOfShoes2, String brand2,
List<String> images2, String description2, List<String> color, List<String> sizes2, String isSold2) {
this.shoePrice = Integer.parseInt(shoePrice2);
this.shipping = Integer.parseInt(shipping2);
this.tax = Integer.parseInt(tax2);
this.subtotal = Integer.parseInt(subTotal2);
this.review = Double.parseDouble(review2);
this.totalRaffles = Integer.parseInt(totalRaffles2);
this.shoeName = shoeName2;
this.style = style2;
this.typeOfShoes = typeOfShoes2;
this.brand = brand2;
this.images = loadImage((String[]) images2.toArray());
this.description = description2;
this.colors = (String[]) color.toArray();
this.sizes = (String[]) sizes2.toArray();
this.isSold = Boolean.getBoolean(isSold2);
}
public Shoe (Shoe s) {
super();
}
public static Shoe[] toArray(List<Shoe> list) {
Shoe[] shoes = new Shoe[list.size()];
for (int i = 0; i < list.size(); i++) {
shoes[i] = list.get(i);
}
return shoes;
}
public static Image[] loadImage(String[] imageURLs) {
Image[] images = new Image[imageURLs.length];
for (int i = 0; i < imageURLs.length; i++) {
Image image = null;
try {
image = ImageIO.read(new URL(imageURLs[i]));
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
images[i] = image;
}
return images;
}
}