Oxford University Press Text Capture Instructions

 

Untitled preliminary paragraphs

Where a section of text begins with untitled paragraphs, followed by titled subsections, capture the paragraphs in a div[1-7,N] element with attribute role="prelim".

A p element must not be a sibling of div[1-7,N] elements unless it contains a paragraph number (enumerator role="paraNum").

Use XPath to identify div[1-7,N] elements with any of the following issues:

  • A missing role="prelim"
  • Redundant div[1-7,N] elements - some siblings may require merging
  • Missing paragraph numbers.

XPath for identifying division hierarchy issues

//*[starts-with(name(),'div')]
[ 
  not(
    child::titleGroup 
    or @role='prelim' 
    or child::p/child::enumerator
  ) 
  and not(
    (
       self::div1 
       or count(child::*[starts-with(name(),'div')]) > 0 
    ) 
    and count(../*[starts-with(name(),'div')]) = 1 
  )
]
Release ID:
20261202
ID:
OUP_Structured_Text_TCI_topic_3_5_2
Author:
dunnm
Last changed:
Wed, 04 Jun 2025
Modified by:
buckmasm
Revision#:
4400