READ: PRHLT indexing and search engine.
Guidelines for the Bentham collection
- Confidence level
- Maximum results
- Viewing search results
- Starting a new search
- Advanced searching
- Example search queries
- Additional help
This interface allows users to search over 90,000 images
comprising the main collections of manuscripts written by
the English philosopher Jeremy Bentham (1748-1832), which
are held by Library University College London and
The British Library. This interface is in its testing stage.
Feedback is welcomed
The PRHLT research center has processed the Bentham Papers with
cutting-edge handwritten text recognition and probabilistic word
indexing technologies. The result is that this vast collection
of Bentham's papers can now be efficiently searched with a fair
level of accuracy, including those papers that have not yet been
This work is the result of a collaboration of the
at the Universitat Politècnica de València
with the Bentham Project
at University College London,
as part of the
which has received funding from the European Union's
Horizon 2020 research and innovation programme under grant agreement
At the top of the page, a text box is provided for you to
enter particular words and phrases that you wish to find
among the manuscripts.
Beneath the search box at the top of the page, a confidence
box and a confidence slider are provided. These allow you to
specify, as a number between 1 and 100, the degree of confidence
that you wish to search at.
If the confidence level is set at a high number, the platform
will return fewer results but the retrieved words are more likely
to be correct. If the confidence level is set at a lower number,
the platform will return more results but the retrieved words are
less likly to be correct.
A maximum results box is also provided to allow you to specify
the number of search results you wish to see.
To begin searching, set your desired confidence level and
maximum number of results. Type your query into the text box
and click 'Search'. The default confidence level is 50%,
Viewing search results
Search results are displayed at 3 hierarchical levels:
, 2. box
and 3. page image
It is best to open each level in a new tab in order to retain
all of your search results.
Viewing search results - at collection level
After making your search, the system will display the results
at collection level. You are presented with a banner stating
the number of boxes which contain the relevant word or phrase.
Viewing search results - at box level
Click this banner and you will see each box listed individually,
along with the number of pages it contains which match your
Click on the thumbnail image of each box to view the search
results for that box. It is best to open each box in a new
tab in order to retain all of your search results.
Viewing search results - at page level
Click on the thumbnail image of an individual box in order to
see a display of relevant manuscript images.
Each entry specifies the page number, the name of the penner
who wrote the page, a thumbnail of the manuscript, the number
of matching words on that page and a confidence bar. By
hovering your mouse over the confidence bar, the precise
confidence value will be shown as a percentage.
You can also hover your mouse over the thumbnail image
of a page to see its exact box and folio number.
The page numbers represent the number of the image;
the folio number represents the Library catalogue number
-- there are often several images, and thus several page numbers,
for each Folio -- see a detailed explanation
Clicking on the thumbnail image of a manuscript will open
the page in question. Again, it is best to open each page in
a new tab in order to retain all of your search results.
The folio number and penner of each page (i.e., the
) is displayed at the top of the screen.
The results of the search queries (called 'spots') will be
highlighted in the manuscript image. The colour of the box
surrounding the relevant word indicates the confidence level,
with green being the highest.
Starting a new search
You can start a new search at any time by typing a query
into the search box.
If you are already viewing a particular box/page, the platform
will only search that particular box/page.
To search the entirety of the Bentham papers, click
'HOME' at the top of the manuscript image or click on the
'Bentham Papers Indexing and Search' link at the top left
of the webpage.
To search within a particular box, go to the searching homepage,
click the banner and then click the thumbnail of the box you
wish to search in.
In order to receive more specific results and to achieve
a greater level of accuracy, a wide range of query
formatting options are available, and several hints
and tips for searching the collection are provided below.
- All search queries ought to be written in plain text,
avoiding the use of accents (e.g. by writing 'protege'
rather than 'protegé'), and, if possible,
transliterating any special characters (e.g. by writing
'astraea' rather than 'astræa').
- Punctuation marks should be omitted from your search
- It is possible to search for abbreviations (e.g. use
'ch' or 'art' in order to search for these specific
- To search for a hyphenated word, you should type the
two parts of the word in square brackets, without a hyphen
(e.g. search for '[sea faring]' rather than 'sea-faring'.
See below information on 'Sequence queries' for more detail.)
- Individual words can be combined into 'compound queries'
in two ways: Boolean queries and
Sequence queries. In addition, the spelling of
each word can be relaxed using wildcard and
- Boolean (AND, OR, NOT) queries.
The AND, OR, NOT operators are expressed using the following
- AND: '&&' (e.g. 'Jeremy && Bentham' will return
results for 'Jeremy' AND 'Bentham'). The AND operator
can be omitted, so the above query can be equivalently stated
as 'Jeremy Bentham'
(e.g. 'law || justice' will return results
for 'law' OR 'justice').
- NOT: '–', placed before each word which ought to be
negated (e.g. 'Bentham - Jeremy - Samuel' will return
results for 'Bentham' but NOT 'Jeremy Bentham or
- PARENTHESES '( )' can be used for grouping:
E.g. 'hard && (labour || work)' will display pages with at
least one instance of 'hard' and at least one instance of
either 'labour' or 'work', or both.
- The quantity of search results corresponds to the
total number of words matching the search query that
have been retrieved
- Sequence queries are AND queries where words
should loosely appear one after the other. They are expresed
as sequences of words in square brackets.
- Sequence queries are not interpreted as exact segments
of text, but allow a few extra (small) words to appear among
the stated words: For example, '[Security of State]' displays
results that contain phrases such as 'Security of a State',
'Security of the State' etc.
- The quantity of search results corresponds to the total
number of complete sequences that have been retrieved.
- Proximity queries
are AND queries alnog with a number to specify
how far appart the AND components are allowed to be.
- The number is a percentage of the whole image size.
For example, 'panopticon &5& house &10& penitentiary'
retrieves images where 'house' is at most 5% apart from
'panopticon' and 'penitentiary' is at most 10% apart from
'panopticon' and 'house'.
- Wildcard word spelling.
The symbol '*' can be used as a wildcard representing any
character string. For example, use 'offen*' to search for
'offend', 'offense', 'offenses', 'offender', 'offending', etc.
A minimum number of actual characters are required in a
partially spelled word with a wildcard. It depends on the
search level: 4 actual characters are required at HOME,
2 at BOX level, and 1 at image level.
- Approximate word spelling.
The special symbol '~' can be appended to any word to find
words which differ from the given one in at most one character.
Larger dissimilarities can be specified by appending a number after
the '~' symbol. For example, use 'committed~' to find 'commited'
and 'comitted', or 'neighbor~2' to find 'neighbor', 'neighbors',
'neighbour', 'neighbours', etc.
The maximum dissimilarity allowed depends on the search
level, as well as on the difference between the length of
the query (without the '~') and the specified dissimilarity:
this difference must be greater than 3 at HOME; 1 at BOX level,
and 0 at image level.
- Both wildcards and approximate spelling may
entail large computing demands on the server and
should be used with care!
- Mixing query types.
Boolean, Sequence and proximity queries can be mixed
arbitrarily. In addition wildcard and approximate word
spelling can be used with all the other types of queries.
For instance, '[pain pleasure] && [pleasure pain]',
'[New (York || Orleans)]', etc.
Example search queries
Compound queries: OR
- convict || prisoner
- offence || violation || crime
- jail || jails || prison || prisons
Compound queries: AND
- Jeremy Bentham
- Samuel && Bentham
- panopticon && penitentiary && house
Compound queries: NOT
- Bentham - Jeremy - Samuel
- panopticon - penitentiary
- penitentiary - panopticon
Compound queries: mixed AND/OR/NOT
- (value || interest) && (work || labour)
- house (panopticon - penitentiary)
Compound queries: proximity
- near &15& neighbour
- Bentham &10& Samuel
- panopticon &5& house &10& penitentiary
- (wrong &15& right) (negative &25& (act || acts))
Compound queries: phrases
- [pain and pleasure]
- [pain pleasure]
- [ Jeremy Bentham]
- [ Security of State ]
- [Panopticon or Inspection House ]
Compound queries: phrases + boolean
- [New York] || [New Orleans]
- [New (York || Orleans)]
- [New York] && [New Jersey]
- bentham &10& samuel - [Samuel Bentham]
- [pain (and||or) pleasure] || [pleasure (and||or) pain]
- [pain (and||or) pleasure] && [pleasure (and||or) pain]
Wildcard and Approximate queries
- offence* &10& punish*
- *maria - Maria
- inter*ing && pre*able
- [near* neighbor~2]
- Elizabeth~ - Elizabeth
- [commit~ (crime* || offence~)]