Home About Subscribe Search Member Area

Humanist Discussion Group

< Back to Volume 33

Humanist Archives: Dec. 15, 2019, 3:23 p.m. Humanist 33.494 - indexing non-Latin scripts

                  Humanist Discussion Group, Vol. 33, No. 494.
            Department of Digital Humanities, King's College London
                   Hosted by King's Digital Lab
                Submit to: humanist@dhhumanist.org

        Date: 2019-12-15 06:09:19+00:00
        From: William Pascoe 
        Subject: Re: [Humanist] 33.490: indexing non-Latin scripts?


I speak English as a first language, and only have a very little of other
languages, but am sympathetic to this problem, and not only because I'm human.
In humanities we often need to deal with all kinds of character encodings and as
a software developer it's the bane of my existence to constantly have to fix up
and debug and transform and re-encode all those little squares and question
marks that happen when different people use different character encodings, and
all I want to do is to allow for Mandarin, or Russian, or anything at all to be
input and output from a database over the web.

The underlying problem at Elsevier Scopus is probably not a technical one but
the usual institutional bureaucratic brick wall. Someone finds they can't paste
the Russian name in. So they ask the manager who says it's policy because they
half remember some explanation from in the 1990s about database indexing and
that half baked reply comes back to the person asking.

Nobody thinks to ask the Database Administrator, who has been campaigning for
years to have this changed and keeps getting ignored day after day, year after

If they asked the DBA they might say, "Yes, we can index Cyrillic, Arabic or
anything you like. The only trouble is these databases were set up in 1990 with
ASCII as the default characterset, so you can only put Latin characters in at
the moment. But that's not a big problem. We just have to convert it. We really
should upgrade to UTF8 which everyone has been using for decades now, which can
handle just about everything. It would take me a couple of hours, but you might
want to allow a week for the proper checking and reindexing and to flow through
the fail overs. The only complication is whether you want to include diacritics
in sorting or not, because it might be just a bit slower if we do."

If they want a policy of also providing Latinised versions, or phonetic
pronunciation guides for English as the global Lingua Franca, they can go right
ahead, but there's no reason they can't use and index people's actual names in
their own language.

Kind regards,

Dr Bill Pascoe
System Architect
Time Layered Cultural Map Of Australia
C21CH Digital Humanities Lab

T: 0435 374 677
E: bill.pascoe@newcastle.edu.au

The University of Newcastle (UON)
University Drive
Callaghan NSW 2308

The University of Newcastle is in the lands of Awabakal, Worimi, Wonaruah,
Biripi and Darkinjung people.

From: Humanist 
Sent: Friday, 13 December 2019 7:25 PM
To: publish-liv@humanist.kdl.kcl.ac.uk 
Subject: [Humanist] 33.490: indexing non-Latin scripts?

Humanist Discussion Group, Vol. 33, No. 490.
Department of Digital Humanities, King's College London
Hosted by King's Digital Lab
Submit to: humanist@dhhumanist.org

Date: 2019-12-12 09:03:30+00:00
From: Miran Hladnik 
Subject: Indexing problems with non-Latin scripts

The following will hardly spark sympathy among English speaking
members of Humanist. But maybe it should, concerning that the word
humanist indicates also a person respecting human dignity. It is about
respecting other scripts and languages.

Some months ago a Russian author in the journal I edit noticed that
his paper hadn't been indexed by Elsevier Scopus. Being aware that
articles and references in the Cyrillic script cause indexing problems
with Scopus, the journal sticks to the instructions from the Scopus
officials and transliterates every single Cyrillic entry into the
Latin script. In spite of that the references were not indexed. I've
intervened with Scopus. After a while I received the astonishing answer
from the content account manager: The paper cannot be processed because
the references are not in English! The new demand and the argument by
Scopus sound like mocking: it would be unacceptable for a resarch
paper to list the titles in a non-existing English translation instead
of in original languages. Our journal publishes predominantly
non-English papers, nevertheless it has been successfully processed by
the same institution so far. The problem seems to be burning only
regarding the use of the Cyrillic alphabet, which evidently disturbs
some Scopus employees and raises suspicion, that someone is after
expelling Russian out of the scientific community to maintain the
dominance of English.

I would appreciate your indexing experience with other languages and
with non-Latin scripts, e. g. Hebrew or Greek. Apart from this, it
seems necessary to tell, that in the times when every mobile device is
capable of recognizing and translating a text of a deliberate script
and language, the terror of English exercised by Elsevier Scopus is
discriminating and indecent. -- miran hladnik

Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php

Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)

This site is maintained under a service level agreement by King's Digital Lab.