The way a completed translation has been produced has changed markedly over the decades since my first days as a translator for Imperial Tobacco in Bedminster, Bristol.
In those days I’d write out the translation in longhand from printed source material and take my manuscript to the typing pool where it would be transformed into typescript.
The next big change came with my learning how to touch-type. By this time I was a freelance with no more access to a typing pool.
In my early freelance days, it was rare to get editable copy that one could overkey with one’s usual word processor, spreadsheet or presentation package. The standard way of working was still from hard copy propped up in a copyholder alongside one’s keyboard.
Then there came a large surge in the use of formats such as PDF – Portable Document Format. This format enables documents, including text formatting and images, to be presented in a manner independent of application software, hardware and operating systems.
If the PDF was text-based, one could simply export the text as plain ASCII text or copy and paste it into a word processor.
However, if I had an image-based PDF to work with, my usual answer was to print it out as hard copy to be propped up in a copyholder alongside my keyboard. This was very expensive in terms of paper and other consumables for the printer, even with a machine as parsimonious as my trusty mono laser printer, whose cartridge was good for printing 3,000 or so pages of copy.
In addition to the expense of printing, there was a far greater drawback to bear in mind, i.e. one could easily miss a sentence or paragraph from the original text when keying in the translated from a hard copy original, with the consequent implications for the quality of the finished work and the client’s satisfaction with it.
Then I discovered OCR – Optical Character Recognition – the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text.
Here’s a short video explaining the basics of OCR.
gImageReader provides a simple graphical front-end to the tesseract OCR engine. The features of gImageReader include:
- Importing PDF documents and images from disk, scanning devices, clipboard and screenshots;
- Process multiple images and documents in one go;
- Manual or automatic recognition area definition;
- Recognising to plain text or to hOCR documents;
- Recognized text displayed directly next to the image;
- Post-processing of the recognised text, including spellchecking;
- Generating PDF documents from hOCR documents.
I generally just stick scanning the input file to plain text, which can then be fed into a regular office suite for translation. If your office suite can handle HTML that’s the format gImageReader outputs as its hOCR output.
The tesseract OCR engine mentioned above can also be enhanced with language packs for post-recognition spellchecking, as mentioned in the features above. At present, tesseract can recognise over 100 different languages.
In addition to GUI-based OCR, there are also Linux packages available which can perform OCR via the command line interface.
My tool of choice here is OCRmyPDF.
OCRmyPDF is a package written in Python 3 that adds OCR layers to PDFs and, like gImageReader, also uses the tesseract OCR engine.
Using OCRmyPDF on the command line is simplicity itself (as shown in the screenshot above:
ocrmypdf -l [language option] inputfile.pdf outputfile.pdf
More complicated command options are possible, but after using that simple string above, you’ll be able to extract the text from your formerly image-based PDF ready for translation.
By way of conclusion depending on the software itself, OCR packages can also extract text from images such as .jpg files.
Google’s developers evidently have a sense of humour, as the search below shows.
Not all humour from techies is quite so obvious to ordinary mortals and is normally deeply buried in comments in code, mark-up and the like.
Tip of the hat: Kevin Mills
Some weeks ago, I blogged about the keyboard shortcut for guillemets – French quotation marks – on a Linux keyboard (posts passim).
My attention in this post is on the German umlaut, also known as diaresis (or in French as a trema. Ed.) the two dots placed over a vowel modifying its pronunciation.
Once again, one could always use the character map to insert a specific vowel with an umlaut.
However, the keyboard shortcut is much quicker.
To produce the letter a with an umlaut – “ä“, follow these steps.
Depress AltGr key and the left-hand square bracket “[” followed by “a“.
The AltGr and left-hand bracket symbol plus the vowel of your choice will give you that character plus an umlaut.
For the upper case version, I find the easiest way to avoid knotting your fingers is to turn on the CapsLock key before the AltGr key and the left-hand square bracket “[” plus vowel sequence.
Yesterday The Document Foundation (TDF) announced the release of LibreOffice 6.2, a significant major release of the free and open source office suite which features a radical new approach to the user interface – based on the MUFFIN concept – and provides user experience options to meet all users’ preferences.
The NotebookBar is available in Tabbed, Grouped and Contextual versions. Each one has a different approach to the menu layout and complements the traditional Toolbars and Sidebar. The Tabbed variant aims to provide a familiar interface for users coming from suites such as MS Office and is supposed to be used primarily without the sidebar, while the Grouped one allows to access “first-level” functions with one click and “second-level” functions with a maximum of two clicks.
The design community has also made substantial changes and improvements to icon themes, in particular Elementary and Karasa Jaga.
LibreOffice 6.2 new and improved features
- The help system offers faster filtering of index keywords, highlighting search terms as they are typed and displaying results based on the selected module.
- Context menus have been tidied up, to be more consistent across the different components in the suite.
- Change tracking performances have been dramatically improved, especially in large documents.
- In Writer, it is now possible to copy spreadsheet data into tables instead of just inserting them as objects.
- In Calc it is now possible to do multivariate regression analysis using the regression tool. In addition, many more statistical measures are now available in the analysis output and the new REGEX function has been added, to match text against a regular expression and optionally replace it.
- In Impress and Draw the motion path of animations can now be modified by dragging its control points. In addition, a couple of text-related drawing styles have been added, as well as a Format Table submenu in Draw.
- LibreOffice Online, the cloud-based version of the suite, includes many improvements too. On mobile devices, the user interface has been simplified, with better responsiveness and updates to the on-screen keyboard.
As with every major and minor release of LibreOffice, interoperability with proprietary file formats has also been improved for better compatibility with Office documents, including old versions which have been dropped by Microsoft. The focus has been on charts, animations and document security features. To assist with interoperability, LibreOffice 6.2 is built with document conversion libraries from the Document Liberation Project.
LibreOffice 6.2’s new features have been developed by a large community of contributors: 74% of commits are from developers employed by companies on the TDF’s the Advisory Board, such as Collabora, Red Hat and CIB and by other contributors such as the City of Munich. Individual volunteers account for 26% of commits.
In addition, there is a global community of individual volunteers taking care of quality assurance, software localization, user interface design and user experience, editing the help pages and documentation.
LibreOffice 6.1.5 for commercial deployments
The Document Foundation has also released LibreOffice 6.1.5, a more mature version which includes some months of back-ported fixes and is better suited for commercial deployments, where features are less important as individual productivity is the main objective.
Companies wishing to deploy LibreOffice are advised to seek assistance for such matters as software support, migrations and training from qualified professionals.
Download LibreOffice 6.2 or LibreOffice 6.1.5
LibreOffice Online is fundamentally a server service and should be installed and configured by adding cloud storage and an SSL certificate. It might be considered an enabling technology for the cloud services offered by ISPs or the private cloud of enterprises and large organisations.
LibreOffice users, free software advocates and community members are encouraged to support The Document Foundation with a donation.
As part of its campaign to increase the use of free software in the public sector (posts passim), the Free Software Foundation Europe (FSFE) has also produced a short video explaining the benefits for the public purse, citizens and the common weal.
My first experiences of computing took place before the widespread use of graphical user interfaces (GUIs).
Consequently, I use a lot of keyboard* shortcuts.
These can also be used to create individual characters and, if known, represent an alternative such as using a visual character map, such as KCharSelect, the character map on the KDE desktop environment on my Linux machines.
So what’s the keyboard shortcut alternative for French quotation marks?
On Linux, most special characters can be inserted into a text editor or office package using the AltGr key plus one or two other keystrokes. If you have the patience to learn them, they can save a lot of time.
For the left guillemet, AltGr+z produces «.
For the right guillemet, AltGr+x produces ».
As you can see, it’s a lot quicker than using a GUI-based alternative.
* = I’ve always used a standard EN-GB keyboard layout.
Let’s start with a trio of questions.
1. Why should governments develop free software*?
2. Where is free software already generating benefits in the public sector?
3. What are free software business models?
Answers to the above questions and practical guidelines are given in the new expert policy brochure published today by the Free Software Foundation Europe.
Entitled “Public Money Public Code – Modernising Public Infrastructure with Free Software“, the brochure aims to answer decision-takers’ questions about the benefits of using and developing free software for public sector organisations.
To help understand the important role that public sector procurement plays in this, the brochure presents an overview of EU free software projects and policies, uncovering legislation on software procurement.
The FSFE will use this brochure in the forthcoming European Parliament elections to inform potential MEPs how to speed up the distribution and development of free software in the public sector and putting appropriate legislation in place.
Download the brochure (PDF).
The brochure evaluates the modernisation of public infrastructure by using free software from the perspectives of academia, law, business and government. Expert articles, reports and interviews help readers to understand the opportunities for free software in the public sector. Practical guidance is provided for decision-makers to move forward and start modernising public infrastructure with free software.
FSFE President Matthias Kirschner states: “Free software licences have proven to generate tremendous benefits for the public sector. This is not a trend that will pass, but rather a long-term development that is based on very positive experiences and strategic considerations resulting from serious vendor lock-in cases in the past. In a few years, free software licences could become the default setting for publicly-financed IT projects. The Free Software Foundation Europe watches these developments very carefully and we want to contribute our knowledge to support the public sector in this transition.”
Initial steps for making free software licenses the default in publicly-financed IT projects are outlined in the brochure. Other topics include competition and potential vendor lock-in, security, democracy, “smart cities” and other important contemporary topics. The language and examples used have been specifically chosen for readers interested in politics and public administrations.
The brochure features leading experts from various ICT areas. Amongst others, these include Francesca Bria, Chief of Technology and Digital Innovation Officer (CTIO) for the Barcelona City Council, Prof. Dr. Simon Schlauri, author of a detailed legal analysis on the benefits of free software for the Swiss canton of Bern, Cedric Thomas, CEO of OW2, Matthias Stürmer, head of the Research Center for Digital Sustainability at the University of Bern and Basanta Thapa from the Competence Center for Public IT (ÖFIT) within the Fraunhofer Institute for Open Communication Systems.
* = In this context the definition of free software is free as in freedom, not beer.
Author Michael Ansaldo speaks warmly of the office suite your ‘umble scribe has been using since its inception in 2010, following the mass departure of OpenOffice.org developers from Sun Microsystems following its takeover by Oracle.
Translated into English, Ansaldo’s final paragraph reads as follows:
In summary, amongst the notable features of LibreOffice 6, we note its excellent compatibility with the [Microsoft] Office formats, as well as an interface that will not disorientate the aficionados of Microsoft’s office suite. Nevertheless, some features are lacking, such as integrated cloud storage or even joint real-time editing. Anyway, LibreOffice 6 is still the best choice for open source fans and all those wanting compatibility with Office without buying Microsoft Office. Its availability for multiple platforms and its frequent updates also make it a clear choice for individuals and businesses.
The first bug hunting session for the forthcoming LibreOffice 6.1 release will be held on Friday, 27th April, The Document Foundation blog has announced.
LibreOffice 6.1, the next point release of the free and open source office suite which emphasises the use of open standards, such as the Open Document Format (ODF), is due to be made available in August this year.
To help ready the software for its release date, the LibreOffice Quality Assurance community is organising an initial bug hunting session this Friday to find, report and triage bugs. Details of the event can be found on the dedicated wiki page.
This first Bug Hunting Session will involve the first Alpha version of LibreOffice 6.1, which will be available on the pre-releases server on the day of the event. Builds will be available for Linux (DEB and RPM package formats), macOS and Windows. Users will be able to run the Alpha release in parallel with their production version – thus enabling testing without affecting users’ existing stable installations.
Mentors will be available on April 27th 2018 from 8.00 a.m. UTC to 8.00 p.m. UTC for questions or help in the IRC channel: #libreoffice-qa (connect via webchat) and its Telegram bridge. During the day there will be 2 dedicated sessions focussed on two of the tenders implemented in LibreOffice 6.1: the first between 10.00 a.m. UTC and 12.00 a.m. UTC to test improvements in image handling; and the second to test the HSQLDB import filter for firebird between 2.00 p.m. UTC and 4.00 p.m. UTC.
According to the release plan, the LibreOffice 6.1 office suite will enter beta stages of development at the end May, with a second beta planned for mid-June. After that, there should be about three RCs released between the first week of July and the first week of August with the final release being available in mid-August.
Within the scope of its own cloud computing environment, the German Federal administration has opted for Nextcloud, heise reports. The software will in future be running by the Federal administration’s central IT service provider, the Federal Information Technology Centre (ITZBund). Unlike services such as Google Drive or Dropbox, Nextcloud is an open source project which users can install in their own computer centres.
The project is targeted at some 300,000 users in various authorities and ministries. They will be able to share and synchronise files centrally using the service. Stuttgart-based Nextcloud GmbH is supporting the ITZBund based on an enterprise subscription for operation and support. However, individual licences per user or per system shall not be incurred.
ITZBund tested Nextcloud with some 5,000 users in a pilot project before making its decision in favour of open source.