Android:从String值中提取链接

时间:2016-09-13 17:36:40

标签: java android regex

我想从共享意图中获得链接。当我通过chrome收到链接时,其格式正确,但有时其他应用程序也会添加文本。

示例:

Chrome:“www.recode.net/2016/7/21/12243560/google-machine-learning-comics-play”

推特:“伙计们看看这个链接真是太酷了https://www.recode.net/2016/7/21/12243560/google-machine-learning-comics-play

所以在Twitter的情况下,我想摆脱所有上下文,只剩下链接,即www.recode.net/2016/7/21/12243560/google-machine-learning-comics-play

注意:链接可以是任何格式https:// ..(或)www。 ..(或)recode.net / ...(开头没有www)。

任何正则表达式可以解决这个问题吗?

@Override
protected void onCreate(Bundle savedInstanceState) 
{
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_shareintent);

    // Get intent, action and MIME type
    Intent intent = getIntent();
    String action = intent.getAction();
    String type = intent.getType();

    if (Intent.ACTION_SEND.equals(action) && type != null) 
    {
        if ("text/plain".equals(type)) 
        {
            // Handle text being sent
            handleSendText(intent); 
        }
    }
}

void handleSendText(Intent intent)
{
    String sharedText = intent.getStringExtra(Intent.EXTRA_TEXT);
    if (sharedText != null) 
    {
        // Update UI to reflect text being shared
        TextView tvShare = (TextView) findViewById(R.id.tvShare);
        tvShare.setText(sharedText);
    }
}

2 个答案:

答案 0 :(得分:2)

以下方法可以解决问题:

//Pull all links from the body for easy retrieval
public ArrayList<String> pullLinks(String text) 
{
    ArrayList<String> links = new ArrayList<String>();

    //String regex = "\\(?\\b(http://|www[.])[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]";
    String regex = "\\(?\\b(https?://|www[.]|ftp://)[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]";

    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(text);

    while(m.find()) 
    {
        String urlStr = m.group();

        if (urlStr.startsWith("(") && urlStr.endsWith(")"))
        {
            urlStr = urlStr.substring(1, urlStr.length() - 1);
        }

            links.add(urlStr);
    }

        return links;
}

答案 1 :(得分:0)

您可以从字符串中识别并提取特定模式。

// Pattern for recognizing a URL, based off RFC 3986 
private static final Pattern urlPattern = Pattern.compile(
        "(?:^|[\\W])((ht|f)tp(s?):\\/\\/|www\\.)" 
                + "(([\\w\\-]+\\.){1,}?([\\w\\-.~]+\\/?)*" 
                + "[\\p{Alnum}.,%_=?&#\\-+()\\[\\]\\*$~@!:/{};']*)", 
        Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);

示例:

Matcher matcher = urlPattern.matcher("foo bar http://example.com baz");
while (matcher.find()) {
    int matchStart = matcher.start(1);
    int matchEnd = matcher.end();
    // now you have the offsets of a URL match 
} 

参考:may be other answers there will be useful to you as well