Question

问题

我有一个看起来像的数据文件（除了几百万行）：

((20091023,http://geocities.com/EnchantedForest/4217/index.html,http://rd.yahoo.com/footer/?http://alerts.yahoo.com/),1)
((20091023,http://geocities.com/EnchantedForest/Mountain/6235/mj.htm,http://rd.yahoo.com/footer/?http://paydirect.yahoo.com/),1)
((20090821,http://geocities.com/EnchantedForest/Cottage/6317/where_you_go.mid,http://geocities.com/EnchantedForest/Cottage/9999/index.html),1)

我的最终目标是拥有一个如下所示的数据文件：

http://geocities.com/EnchantedForest/4217,http://rd.yahoo.com/footer/?http://alerts.yahoo.com/,1
http://geocities.com/EnchantedForest/Mountain/6235,http://rd.yahoo.com/footer/?http://alerts.yahoo.com/,1
http://geocities.com/EnchantedForest/Cottage/6317,http://geocities.com/EnchantedForest/Cottage/9999,1

请注意上面第3行的独特模式更改。

基本上是这样的：

第一步：查找以四位数字结尾的所有网址并在那里结束 - 因此以XXXX结尾的任何网址字符串都会停在那里（因此我们没有单独的文件）。这应该是全球性的 第二步：清理它，以便数据是第一个URL，第二个URL，NUMBER

我迄今为止所做的事

我目前的解决方案是：

sed -E 's/([0-9]{8}),(http.+?[0-9]{4})(.+?,)/\2,/g'

也就是说，它在理论上创建了三个组： - 一组是前八位数（我不在乎），第二组是最多四位数的URL（我这样做），第三个是字符串的其余部分。

但是，我现在的结果是这样的：

((http://geocities.com/EnchantedForest/Dell/3883,7)

哪个接近，但删除了目的地。

任何帮助或提示？

Answer 1

将enter code here import java.io.File; import javax.xml.bind.JAXBContext; import javax.xml.bind.JAXBException; import javax.xml.bind.Unmarshaller; public class Demo { public static void unma() throws JAXBException { JAXBContext jc = JAXBContext.newInstance(book.class); Unmarshaller um = jc.createUnmarshaller(); book b = (book) um.unmarshal(new File("src/Dv600/books.xml")); System.out.println("information"); System.out.println("id" + b.getID()); System.out.println("Author" + b.getAuthor()); System.out.println("description" + b.getDiscription()); } public static void main(String[] args) throws JAXBException { unma(); } }与多个var link = "<%=j link %>";命令配合使用：

sed

<强>输出：

使用正则表达式进行选择性网址清理

问题

我迄今为止所做的事

1 个答案: