Bristol

Focus on OCR

The way a completed translation has been produced has changed markedly over the decades since my first days as a translator for Imperial Tobacco in Bedminster, Bristol.

In those days I’d write out the translation in longhand from printed source material and take my manuscript to the typing pool where it would be transformed into typescript.

The next big change came with my learning how to touch-type. By this time I was a freelance with no more access to a typing pool.

In my early freelance days, it was rare to get editable copy that one could overkey with one’s usual word processor, spreadsheet or presentation package. The standard way of working was still from hard copy propped up in a copyholder alongside one’s keyboard.

Then there came a large surge in the use of formats such as PDF – Portable Document Format. This format enables documents, including text formatting and images, to be presented in a manner independent of application software, hardware and operating systems.

If the PDF was text-based, one could simply export the text as plain ASCII text or copy and paste it into a word processor.

However, if I had an image-based PDF to work with, my usual answer was to print it out as hard copy to be propped up in a copyholder alongside my keyboard. This was very expensive in terms of paper and other consumables for the printer, even with a machine as parsimonious as my trusty mono laser printer, whose cartridge was good for printing 3,000 or so pages of copy.

In addition to the expense of printing, there was a far greater drawback to bear in mind, i.e. one could easily miss a sentence or paragraph from the original text when keying in the translated from a hard copy original, with the consequent implications for the quality of the finished work and the client’s satisfaction with it.

Then I discovered OCR – Optical Character Recognition – the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text.

Here’s a short video explaining the basics of OCR.

My preferred OCR package is gImageReader and – as with the software I recommend for use by translators – is open source and available for both Linux and Windows.

Gimagereader in action on Italian language PDF

gImageReader in action on Italian language PDF

gImageReader provides a simple graphical front-end to the tesseract OCR engine. The features of gImageReader include:

  • Importing PDF documents and images from disk, scanning devices, clipboard and screenshots;
  • Process multiple images and documents in one go;
  • Manual or automatic recognition area definition;
  • Recognising to plain text or to hOCR documents;
  • Recognized text displayed directly next to the image;
  • Post-processing of the recognised text, including spellchecking;
  • Generating PDF documents from hOCR documents.

I generally just stick scanning the input file to plain text, which can then be fed into a regular office suite for translation. If your office suite can handle HTML that’s the format gImageReader outputs as its hOCR output.

The tesseract OCR engine mentioned above can also be enhanced with language packs for post-recognition spellchecking, as mentioned in the features above. At present, tesseract can recognise over 100 different languages.

In addition to GUI-based OCR, there are also Linux packages available which can perform OCR via the command line interface.

My tool of choice here is OCRmyPDF.

ocrmypdf in action in KDE Konsole terminal

ocrmypdf being used in KDE’s Konsole terminal to add OCR layer to Spanish language PDF

OCRmyPDF is a package written in Python 3 that adds OCR layers to PDFs and, like gImageReader, also uses the tesseract OCR engine.

Using OCRmyPDF on the command line is simplicity itself (as shown in the screenshot above:

ocrmypdf -l [language option] inputfile.pdf outputfile.pdf

More complicated command options are possible, but after using that simple string above, you’ll be able to extract the text from your formerly image-based PDF ready for translation.

By way of conclusion depending on the software itself, OCR packages can also extract text from images such as .jpg files.

A penny for your thoughts?

If there’s one thing that can be said about language, it’s that it’s dynamic. Blink for a second and you might miss the coining of a neologism or an old turn of phrase becoming obsolete.

The latter in particular can have amusing consequences, especially if re-used by someone possibly too young to appreciate the original connotations of the word or phrase.

One such most likely occurred today in a Bristol Post piece about free travel in the Bristol area on Unibus services.

The item’s second paragraph reads as follows:

Passengers will able to hop on the Unibus U2 service, from Monday February 18 until Friday, February 22 without spending a penny.

To someone of my age (rapidly approaching where I qualify for a pass for free bus travel. Ed.), the phrase has connotations other than obtaining buckshee travel.

As Collins Dictionary helpfully points out:

If someone says that they are going to spend a penny, they mean that they are going to go to the toilet. [British, old-fashioned, politeness]

old coin-operated public toilet lockThe origins of the phrase stretch back to the Victorian era and refer to the use of coin-operated locks on public toilets in the UK. Such locks were first used in a public toilet outside London’s Royal Exchange in the 1850s.

The phrase “to spend a penny” has now largely died out and been forgotten, except by those with greying hair, due to changes to public toilets themselves (many of which have been closed by austerity-hit local authorities. Ed.) and changes in the charges to use a toilet. Last time I looked while on my travels, the toilets at Manchester Victoria railway station cost an exorbitant 20p, i.e. 4 shillings or 48 times the original cost of one penny.

The ultimate cheese toastie?

Earlier this week, Bristol City Council’s licensing committee voted to ban the sale of toasted cheese sandwiches in a north Bristol park due to concerns about anti-social behaviour (posts passim).

Whilst doing background research for that post, your correspondent discovered what must count as the world’s ultimate cheese toastie, particularly if the main metrological criterion for the snack’s assessment is its cholesterol content.

Enjoy! πŸ˜€

Desperately seeking Vivian

One of the more interesting aspects of running a website is dealing with stuff that the ordinary visitor doesn’t see, both the bad (spam comments posted by bots) and the good.

As regards the latter, read on.

For instance, over Christmas I was contacted by a gentleman who’d attended Avonvale Road School (posts passim) in the 1960s as a primary pupil and wrote to me to see if I could update him on its fate.

Unfortunately, I had to tell him that the buildings he knew had been demolished to make way for the modern school that now occupies the site.

Earlier this week I was contacted via this site by Louise Allum, sister of the late Viv, who was on our BA Modern Languages course in Wolverhampton.

Louise read my write-up of the last reunion* (posts passim).

Louise was wondering if any of her fellow students from the course had any photos from their student days featuring her, which they would be willing to share in some form as she has no pictures of her from that era.

If any of my former BAML colleagues happen to read this and can help out, please get in touch and I’ll put you in contact with Louise.

* = In the course of trying to help out Louise, I got hold of a fellow alumnus and received the news that the next reunion is in the early planning stages.

Cheese toastie shocker

Yesterday’s online version of the Bristol Post (now renamed Bristol Live. Ed.) carried a shocking item about a hitherto unknown catalyst for violence: the toasted cheese sandwich.

According to the Post, this humble snack may not be served at a proposed catering concession in Monk’s Park in Bristol’s Southmead district “amid fears a proposed hot food van could attract booze-fuelled anti-social behaviour and motorbike gangs“.

The Post continues:

Councillors have agreed to grant a provisional licence for cold food, such as ice cream, and tea and coffee in Monk’s Park, Biddestone Road.

But the vendor would be barred from selling hot snacks following dozens of objections from residents, a ward councillor and the headteacher of a nearby secondary school.

A provoker of violence, accompanied by tomato soup.

A provoker of violence, accompanied by not quite so provocative tomato soup. Image courtesy of Wikimedia Commons.

However, the fear of violent behaviour was not the only concern for banning hot food: councillors on the city council’s public safety and protection committee also feared children from the next-door school would be tempted to skip lessons due to the lure of grilled fermented curd.

Following the committee’s decision the concession will now be put out to tender.

However, the story does not end there. When your correspondent posted about the article on Twitter, one person to respond was local artist Dru Marland, whose response about fermented curd addiction was hilarious.

XDru's tweet reads they start 'em on Dairylea slices, and before you know it they're mugging pensioners for their next fix of Stinking Bishop

For a more complete understanding of the violence-inducing properties of cheese, I should have asked the committee about their opinions of more exotic varieties of fermented curd, such as Roquefort or Graviera, but pressure of time dictated otherwise. πŸ™‚

Update: Not forty-eight hours after Bristol was opened to national and international ridicule over this affair, Bristol Live reports that residents of Bristol’s Cotham district have branded a hot food catering van an “appalling idea“. You couldn’t make this stuff up!

Post exclusive: fire brigade incident at non-existent tower block

One thing is certain about life in Bristol: it’s quite unlike living anywhere else and can sometimes be well beyond the borders of the surreal.

This feeling is enhanced by reading the Bristol Post, city’s newspaper of (warped) record.

Just skimming casually through the Post website, readers may easily miss some real exclusives, such as this fire brigade incident reported yesterday by Heather Pickstock, who is alleged to be the paper’s North Somerset reporter.

As shown in the screenshot above, Ms Pickstock informs readers as follows in this fine piece of creative writing:

screenshot of part of article

Crews from Southmead, Temple, Kingswood, Hicks Gate, Bedminster and Pill were called at 9.46pm yesterday to reports of smoke billowing from the sixth floor of a high rise block a Littlecroft House, Pip Street, Eastville.

There’s just one thing wrong with the above sentence: it’s completely incorrect; there’s no Pip Street in Eastville and no high rise block called Littlecroft House either.

A research technique known to ordinary mortals, but not to Ms Pickstock, affectionately known as “5 minutes’ Googling” reveals there’s a a council tower block called Little Cross House in Phipps Street, Southville, a good four miles across the city from Eastville.

The Bristol area can breathe a sigh of relief that Ms Pickstock does not work as a call handler on the 999 emergency switchboard. πŸ˜‰

Fell is foul

Many of the phrases in common use in English have 2 sources: either the Bible (both the authorised King James version and earlier translations, such as those of Wycliffe and Tyndale. Ed.) and the pen of William Shakespeare.

Indeed, some lovers of the English language actually refer to it euphemistically as “the language of Shakespeare” when someone ignorant commits an indignity with it.

Today’s online edition of the Bristol Post/Live, the city’s newspaper of (warped) record has not difficulty in mangling some of the Bard of Avon’s actual words.

The misquoting of the Bard occurs in a promotional piece advertising a supermarket chain’s substantial breakfast. The piece itself was a cut and paste job lifted from the Post’s Trinity Mirror stablemate, the Manchester Evening News, which itself lifted the item from the Metro, a publication so downmarket its owners the Daily Mail have to give it away.

misquoted Shakespeare quote is one foul swoop

However, neither the MEN nor the Metro saw fit to misquote Shakespeare; that was a solo effort by the Temple Way Ministry of Truth.

The offending sentence is in the final passage shown in the above screenshot, i.e.:

The breakfast contains your entire daily allowance in one foul swoop, but it’s described as the perfect meal for those with a big appetite.

The actual words penned by Shakespeare are not “one foul swoop” but “one fell swoop” and occur in Macbeth, Act 4, scene 3, when Macduff hears that his family have been killed. Macduff remarks:

All my pretty ones?
Did you say all?β€”O hell-kite!β€”All?
What, all my pretty chickens, and their dam,
At one fell swoop?

One fowl swoop” is occurs frequently as a variation to the misquotation.

Whether Shakespeare actually invented the phrase himself or was the first to write it down is a matter of debate. Even so, Macbeth was written in 1605, so even the Bard’s the phrase dates back over four centuries.

The adjective “fell” is archaic, meaning evil or cruel, so it’s unsurprising that it’s misquoted. Moreover, in its context tends to occur in literary works such as J.R.R. Tolkien’s epic “Lord of the Rings” (e.g. fell beasts).

Driverless vehicle turns to theft

This blog has previously documented the carnage on the highways caused by driverless vehicles (posts passim).

The Bristol Post, the city’s newspaper of warped record, has now discovered that driverless vehicles are not only responsible for so-called “accidents“, but have now turned to theft – or attempted theft – as well.

Headline reads Police stop 4X4 on motorway with fake license plates after it tried to steal a caravan

If there’s one crumb of comfort to be gained from the above report, it is that our brave boys and girls in blue would have had no trouble spotting the offending vehicle with those American “license plates“. πŸ˜‰

Exclusive: Bristol Post changes name to Manchester Evening News

It’s official: the Bristol Post (or is it BristolLive? Ed.) is changing its name to the Manchester Evening News.

And the revelation comes in a piece from no less a personage than Mike Norton, the title’s editor in chief himself, and is hidden away in the details about the implications of the General Data Protection Regulation (GDPR).

The relevant section is outlined in red in the image below. Click on the image for the full-sized version.

relevant sentence reads: However, the GDPR is not just related to emails. It affects every industry, business, including publishing and therefore ours here at manchestereveningnews.co.uk

Whether production of the Post will be moved up north from the Temple Way Ministry of Truth is not mentioned.

Is Mike Norton guilty of copying and pasting without checking the actual wording?

In Private Eye’s immortal words: we should be told! πŸ™‚

(Crab) apple blossom time

On my way to the shops this fine May morning, my attention was caught by the beauty of the crab apple (Malus sylvestris) blossom on the tree in the small park that runs up the side of Bannerman Road in Easton, as shown below.

Crab apple blossom in Bannerman Road

Crab apple blossom in Bannerman Road

According to the Woodland Trust, the crab apple is a native UK species which thrives in heavy soil in hedgerows, woods and areas of scrub. It’s one of the ancestors of the cultivated apple and individual trees can live up to 100 years and can grow to about 10 metres in height.

The common name “crab apple” derives from the tree’s often knarled and crabbed appearance, especially when growing in exposed places.

In the autumn our local tree produces a fine crop of crab apples, as this picture from autumn 2017 shows.

Bannerman Road's crab apple tree bearing fruit in autumn 2017

Bannerman Road’s crab apple tree bearing fruit in autumn 2017

Each autumn I tell myself I shall have to come and gather the fruit to make crab apple jelly. After all, it will be food for free (mostly!).

As an aide-memoire and incentive to myself, below is the recipe for (crab) apple jelly from my trusty 1950s vintage recipe book (hence the imperial measurements. Ed.).

Ingredients

  • 4 lbs crab or cooking apples
  • 2 pints water
  • 1 stick cinnamon, or
  • A few cloves, or
  • Strips of lemon rind
  • 1 lb of sugar per pint of juice obtained

Method
Wash the apples and wipe. Cut into quarters, but do not remove the skin or core. Put the fruit into a pan with the water and the cinnamon, cloves or lemon peel tied in a piece of muslin. Stew until the fruit is soft. Test for pectin. Remove the muslin bag. Turn the contents of the pan into a jelly bag and leave overnight to strain. Measure the juice and heat in a pan. Add 1 lb of warmed sugar to each pint of juice, stirring until all the sugar has dissolved. Bring to the boil and boil rapidly until the jelly sets when tested on a cold saucer or plate. Remove the scum. Pot and seal whilst still hot.

Before we leave apple blossom, your correspondent can’t help remembering and old song called “(I’ll Be With You In) Apple Blossom Time“, which he remembers being sung by The Andrews Sisters, which reached no. 5 in the USA in 1941.

However, the song is nearly 20 years older than the success enjoyed with it by Laverne, Maxine and Patty, having been written by Albert Von Tilzer and lyricist Neville Fleeson and copyrighted in 1920.

Go to Top