URL常规格式

时间:2013-04-13 23:03:14

标签: c++ url format

我编写了一个C ++程序,允许将网址发布到YouTube上。它的工作原理是将URL作为输入从您输入程序或直接输入,然后它将替换每个'/','。'在带有'*'的字符串中。然后将此修改后的字符串放在剪贴板上(这仅适用于Windows用户)。

当然,在我甚至可以调用该程序之前,它必须返回:我需要知道何时在URL中使用'。','/'。我查看了这篇文章:http://en.wikipedia.org/wiki/Uniform_Resource_Locator,并且知道'。'在处理“主网站”时使用(在此网址的情况下为“en.wikipedia.org”),之后使用“/”,但我去过其他网站http://msdn.microsoft.com/en-us/library/windows/desktop/ms649048%28v=vs.85%29.aspx,根本不是这种情况(它甚至分别用“%28”,“%29”代替'(',')'!)

我似乎还要求.aspx文件,不管是什么。还有一个'。'在该URL的括号内。我甚至试图查看关于URL的正则表达式(我还没有完全理解那些......)。有人可以告诉我(或链接我)关于在URL中使用'。','/'的规则吗?

1 个答案:

答案 0 :(得分:2)

你能解释一下你为什么要做这个令人费解的事吗?你想要实现什么目标?一旦你回答了这个问题,你可能不需要像你想的那样知道。

同时这里有一些信息。 URL实际上由许多部分组成

http:     - the "scheme" or protocol used to access the resource. "HTTP", "HTTPS",
            "FTP", etc are all examples of a scheme. There are many others

//        - separates the protocol from the host (server) address

myserver.org - the host. The host name is looked up against a DNS (Dynamic Name Server)
            service and resolved to an IP address - the "phone number" of the machine
            which can serve up the resource (like "98.139.183.24" for www.yahoo.com)

www.myserver.org - the host with a prefix. Sometimes the same domain (`myserver.org`)
            connects multiple servers (or ports) and you can be sent straight to the
            right server with the prefix (mail., www., ftp., ... up to the
            administrators of the domain). Conventionally, a server that serves content
            intended for viewing with a browser has a `www.` prefix, but there's no rule
            that says this must be the case. 

:8080/    - sometimes, you see a colon followed by up to five digits after the domain.
            this indicates the PORT on the server where you are accessing data
            some servers allow certain specific services on just a particular port
            they might have a "public access" website on port 80, and another one on 8080
            the https:// protocol defaults to port 443, there are ports for telnet, ftp, 
            etc. Add these things only if you REALLY know what you are doing.

/the/pa.th/ this is the path relative to DOCUMENTROOT on the server where the
            resource is located. `.` characters are legal here, just as they are in
            directory structures. 

file.html
file.php
file.asp
etc       - usually the resource being fetched is a file. The file may have
            any of a great number of extensions; some of these indicate to the server that
            instead of sending the file straight to the requester,
            it has to execute a program or other instructions in this file,
            and send the result of that
            Examples of extensions that indicate "active" pages include
            (this is not nearly exhaustive - just "for instance"):
            .php = contains a php program
            .py  = contains a python program
            .js  = contains a javascript program
                   (usually called from inside an .htm or .html)
            .asp = "active server page" associated with a
                   Microsoft Internet Information Server

东西=值安培; somethingElse =%23othervalue%23                传递给服务器的参数可以显示在URL中。                这可用于传递参数,表单中的条目等。                任何角色都可以在这里传递 - 包括'。','&','/',......                但你不能只在字符串中写下这些字符......

现在是有趣的部分。

网址不能包含某些字符(实际上很多)。为了解决这个问题,存在一种称为“转义”字符的机制。通常,这意味着用十六进制等效替换字符,前缀为%符号。因此,您经常会看到一个空格字符,例如%20。你可以找到一份简单的清单here

有许多功能可用于将URL中的“非法”字符自动转换为“合法”值。

要了解确切的内容和不允许的内容,您需要回到原始规格。参见例如

http://www.ietf.org/rfc/rfc1738.txt

http://www.ietf.org/rfc/rfc2396.txt

http://www.ietf.org/rfc/rfc3986.txt

我按时间顺序列出它们 - 最后一个是最新的。

但我重复一下我的问题 - 你真的想在这做什么,为什么?