Oxford University Press Text Capture Instructions

 

Web URL links

Capture links to external URLs and references to websites in the url element with the full URL in the webUrl attribute value including the http://, https:// or ftp:// prefix.

Typecode

URL

URL of a web address

//url

Before capturing the URL, check that it works. Provided the URL works, capture it as #PCDATA conforming to the URL specification within the url element. Set the value of the webUrl attribute to the element content and include a prefix. The prefix may not be present in the content.

If the reference is to a website (without a URL in the text), or to a URL link that does not work ask your contact at OUP for clarification.

Please note that the use of the canonical attribute is now deprecated for all content.

If the URL does not contain a prefix (eg. http:// or https://) then prefix the webUrl attribute value with http://.

When capturing the webURL attribute

  • Capture uppercase and lowercase letters, decimal digits, hyphen, period, apostrophe, underscore, tilde, question mark, hash, plus, and equals characters as plain text.
  • Capture ampersand as &.
  • Capture any other character (including space) using Percent Encoding.

Example: Plain URL without http

Input

www.thecommonwealth.org

Output


<url id="law-ildc-11gm01-url-1" webUrl="http://www.thecommonwealth.org">www.thecommonwealth.org</url>

Input

www.philosophersimprint.org

Output


<url webUrl="http://www.philosophersimprint.org">www.philosophersimprint.org</url>

Example: Whitespace in URL

Input

www.epic.tvu.ac.uk/PDF Files/epic2/epic2-final.pdf

Output


<url webUrl="http://www.epic.tvu.ac.uk/PDF%20Files/epic2/epic2-final.pdf">www.epic.tvu.ac.uk/PDF Files/epic2/epic2-final.pdf</url>

Special characters in URL

Input

http://www.bcn.cl/pags/legislación/leyes/constitución_politica.htm

Output


<url webUrl="http://www.bcn.cl/pags/legislaci%C3%B3n/leyes/constituci%C3%B3n_politica.htm">http://www.bcn.cl/pags/legislación/leyes/constitución_politica.htm</url>

Input

www.somosdefensores.org/attachments/article/412/informesomosdefensoresespañolFINAL2012.pdf

Output


<url webUrl="http://www.somosdefensores.org/attachments/article/412/informesomosdefensoresespa%C3%B1olFINAL2012.pdf">www.somosdefensores.org/attachments/article/412/informesomosdefensoresespañolFINAL2012.pdf</url>

Input

http://www.law.duke.edu/shell/cite.pl?9+Duke+J.+Gender+L.+&+Pol'y+237#B8

Output


<url webUrl="http://www.law.duke.edu/shell/cite.pl?9+Duke+J.+Gender+L.+&amp;amp;+Pol'y+237#B8">http://www.law.duke.edu/shell/cite.pl?9+Duke+J.+Gender+L.+&amp;#x0026;+Pol&amp;#x0027;y+237#B8"</url>
Release ID:
20261202
ID:
OUP_A-Z_Reference_Works_OxEncyclML_TCI_topic_3_11
Author:
dunnm
Last changed:
Wed, 04 Jun 2025
Modified by:
buckmasm
Revision#:
4400