You are on page 1of 16

Lingloss

Welcome to The Lingloss Project Page! In 1967, I designed what was meant to be an international auxiliary language called Lingloss. Like many other such projects, it was never really ready enough to inflict upon the public. In 2012 Lingloss still remains a work in progress; however, I believe I have recently made some progress on one aspect of the overall problem. The reasons for this belief are more fully detailed at http://www.richardsandesforsyth.net/docs/bunnies.pdf . So I am using this webpage share some software which, when more fully developed, may help designers of the coming international auxiliary language. (Yes, there will have to be one eventually: the human race can always be relied upon to do the right thing, as Churchill said of the Americans, once they have exhausted the alternatives.) The software is concerned with the problem of establishing a suitable core vocabulary. This is an obstacle that prior efforts have never convincingly overcome. What you will find when you download and unzip [glossoft.zip] is a pair of programs written in Python3 (along with various ancillary files) which address the following aspects of the vocabulary-building problem: 1. How to choose a core collection of lexical items, i.e. what Hogben (1963) calls a "list of essential semantic units" (LESU), which is concise enough to be learnt in a matter of weeks and at the same time extensive enough to support the great majority of essential communicative functions; 2. How to choose a suitable international word for each of the items in the LESU. Towards a Core Vocabulary The program corevox1.py takes in several lists of essential semantic units (formatted one item per line) and produces a consensus list consisting of all the items that occur in at least minfreq of the input lists, where minfreq is an integer from 1 (in which case the output is all the items that occur in any of the input lists) to N, the number of input lists (in which case the output is only those items common to all the input lists). Where do the input lists come from? Well, to test the program, four files containing previous attempts to come up with a LESU are provided (baslist, hoglist, longlist and maclist). These are, respectively: the Basic English wordlist (Ogden,1937); the LESU of "Essential World English" (Hogben, 1963); the defining vocabulary of the Longman English Dictionary (Longman, 2003); the defining vocabulary of the MacMillan English dictionary for advanced learners (MacMillan, 2002). [subfolder: lexicons] Ogden and Hogben were trying to establish minimal subsets of words needed for the majority of communicative purposes in simplified versions of English. Compilers of the Longman and MacMillan dictionaries were trying to establish basic word lists in terms of which all the other entries in their

dictionaries could be defined. Thus all four lists represent principled attempts to create concise but effective vocabularies. They didn't all settle on the same words, but any term that appears in more than one of these sets is likely have a strong claim for inclusion in anyone's core vocabulary. Note that, although most of the entries in these lists are relatively common, they are not mere frequency lists. They result from attempts to cover the most commonly used concepts without redundancy. Therefore some high-frequency terms will be excluded if they are redundant. I should perhaps apologize for anglocentric bias here; although in mitigation it should be noted that there is nothing in this software that limits it to the English language. I am most at home with English examples, but I would hope that others could apply the same methods to other languages: the comparisons would be instructive. Towards an International Vocabulary The second program, avwords3.py, is more innovative, as far as the field of interlinguistics is concerned. It finds the 'verbal average' of a number of different words. As far as I know, nobody has ever defined what a verbal average might be; so, to be a little more specific, the heart of this program is a function that takes in a number of strings (usually words, though they could be short phrases) and produces a string which is, in a certain sense, the most typical representative of those input strings. As currently implemented, it works in 2 stages. Firstly, using a string-similarity scoring function, the string in the group which is most similar to all the others of that group is chosen. Secondly, certain manipulations, such as dropping a character or swapping 2 adjacent characters, are tried to see if they increase the similarity score of that string in relation to the rest and, if so, the modified string is accepted. For example, given the following inputs ['cheval', 'caballo', 'cavallo', 'cavalo', 'cal', 'equus', 'cavall'] which are the French, Spanish, Italian, Portuguese, Romanian, Latin and Catalan words for 'horse', the program computes that 'cal' is the most central or typical item. In this case, no deletions or letter-exchanges make it more typical, so it is retained. The program works by reading in several (utf8) files in the format exemplified below.

young you yes yellow year would work word wool woods

giovane voi s giallo anno sarebbe lavorare parola lana bosco

wood legno woman donna with con wire filo wing ala wine vino window finestra

This is an extract from a simple English-Italian lexicon: each line consists of a source-language term followed by a target-language equivalent, with tab character separating them. Each of these input lexicons uses the same source language (English in the examples provided) with a different target language (various Romance languages in the examples provided). These sample bilingual lexicons can be found in the lexicons folder after you have unzipped the software. Incidentally, the part that hasn't been automated is going from the LESU produced as output by corevox1.py to the several lexicons needed as input by avwords3.py. There are lots of public-domain bilingual lexicons, so it would be possible to write software that took a LESU and an existing lexicon (English-to-target-language in the present case) and produced suitable input for avwords3.py, but to do it properly would, I suspect, require human scrutiny anyway, so that task is left as "an exercise for the reader". The output of avwords3.py is a lexicon in the same format as the inputs, where each sourcelanguage item is associated with the 'verbal average' of the terms in the various target languages -intended as a first approximation to an English-Lingloss dictionary. Example output produced from the seven small example inputs in the lexicons folder follows below.

Mon Dec 24 16:28:24 2012 window fenestra wine vin wing ala wire fil with con woman mulier wood lea woods bos wool lana word parala work trabaar would voudrais year ano yellow gallo yes si you voi young jove

On the basis of the example data provided here, Lingloss, if it ever gets into circulation, would look very much like a Romance language, a kind of simplified, modernized Latin. However, that decision is by no means set in stone. The main point of computerizing parts of the process is to permit exploration of alternative design decisions.

The English word 'would' isn't expressed by a single word in these languages, thus illustrates the need for human pre-processing or post-processing. In fact, avword3.py also produces a listing file in which the quality of the 'verbal averages' is shown. This is meant to provide serious users with information to enable them to decide which of the proposed term equivalents need further attention. These programs are prototypes, intended to illustrate a particular methodology, which I believe is novel. Much work remains to be done. For example, comparison of alternative string-similarity scoring functions would be a good idea; as would a test of whether each target word should be rendered into a common phonetic representation or just taken as spelled; and so on. The main point is to stimulate such work. Running the programs To execute the programs you will have to obtain Python (version 3 not 2) if you don't already have it. This can be found at www.python.org I have tested these programs under Windows7, but I believe they should run without alteration under Linux as well. Then you will have to unzip the file glossoft.zip preferably at your top-level directory. This will have subfolders as follows. lexicons libs op p3 sample LESUs and small-scale bilingual lexicons common routines and variables for the programs in p3 default directory to receive output Python3 programs directory to hold parameter files

parapath

Each program requires certain input parameters, which are put into a text file that can be edited by Notepad, Notepad++ or other text editors. Example parameter files for using the example data provided will be found on the parapath folder once the zipped file has been unpacked. Each line of a parameter file starts with a parameter name then one or more spaces then the value for that parameter. Unknown parameters are ignored. Parameters not given a value in the parameter file receive a default value. A table of parameters used by the programs follows. parameter type name casefold 0 .. 1 default 1 description whether to fold uppercase to lower case on input; 1 implies yes, 0 implies no.

jobname minfreq

alphanumeric string integer

same name as program 2

name to link output files

outgloss

vocfile

voclists

withkey

Windows or Linux filespec Windows or Linux filespec Windows or Linux filespec 0 .. 1

minimum number of input LESU files in which a term must appear in to be kept for output avwords_glos output file for consensus lexicon

corevox_vocs output file for consensus LESU

lesu.dat / lexicons.txt 0

input text file containing list of input file-specs, 1 per line whether to include the sourcelanguage term along with the target-language equivalents in avwords (1), or not (0)

The content of coretest.txt, a simple initial parameter file for corevox1.py, is copied below. voclists c:\glossoft\parapath\lesu.txt vocfile c:\glossoft\op\corelist.txt minfreq 2 The content of wordavs.txt, a starter parameter file for avwords3.py, is copied below. voclists c:\glossoft\parapath\glossies.txt outgloss c:\glossoft\op\glossout.txt withkey 0 Pretty simple, eh?

References Hogben, L. (1943). Interglossa. Harmondsworth: Penguin Books. Hogben, L. (1963). Essential World English. London: Michael Joseph Ltd. Longman (2003). Dictionary of Contemporary English. Harlow: Pearson Educational Ltd. Macmillan (2002). MacMillan English Dictionary for Advanced Learners. Oxford: MacMillan Education. Ogden, C.K. (1937). The ABC of Basic English. London: Kegan, Paul, Trench, Trubner & Co. Ltd.

Appendix Constructed Auxiliary Languages : Year Language 1661 Universal Character 1668 Real Character Characteristica 1699 Universalis 1765 Nouvelle Langue 1866 Solresol 1868 Universalglot 1880 Volapuk 1886 Pasilingua 1887 Bopal 1887 Esperanto 1888 Lingua 1888 Spelin 1890 Mundolingue 1892 Latinesce 1893 Balta 1893 Dil 1893 Orba 1896 Veltparl 1899 Langue Bleu 1902 Idiom Neutral 1903 Latino sine Flexione 1906 Ro 1907 Ido 1913 Esperantido 1922 Occidental 1928 Novial 1943 Interglossa 1944 Mondial 1951 Interlingua 1957 Frater 1961 Loglan 1967 Lingloss 1983 Uropi 1996 Unish 1998 Lingua Franca Nova 2002 Mondlango 2011 Angos

Surname Dalgarno Wilkins Leibniz de Villeneuve Sudre Pirro Schleyer Steiner de Max Zamenhof Henderson Bauer Lott Henderson Dormoy Fieweger Guardiola von Arnim Bollack Rosenberger Peano Foster de Beaufront de Saussure de Wahl Jespersen Hogben Heimer Gode Thai Brown Forsyth Landais Jung Boeree Yafu Wood

Forename(s) George Bishop Gottfried Faiguet Francois Jean Martin Paul Saint Lazarus George Georg Julius George Emile Julius Jose Wilhelm Leon Waldemar Giuseppe Edward Louis Rene Edgar Otto Lancelot Helge Alexander Pham Xuan James Richard Joel Young Hee George He Benjamin

In Praise of Fluffy Bunnies

Copyright 2012, Richard Forsyth. Background Reading John Lanchester's Whoops! , an entertaining account of how highly paid hotshot traders in a number of prestigious financial institution s brought the world to the brink of economic collapse, I was s truck by the following sentence : "In an ideal world, one populated by vegetarians, Esperanto speakers and fluffy bunny wabbits, derivatives would be used for one thing only: reducing levels of risk." (Lanchester, 2010: 37). Wha t struck me about this throwaway remark, apart from the obvious implication that derivatives were actual ly used to magnify risk rather than reducing it (doubtless by carnivores ignorant of Esperanto) , was its presumption that right thinking readers would t ake it for granted that Espera nto symbolize s well meaning futility -thus highlighting the author's status as a tough minded realist. This is just one i llustration that disdain for Esp

eranto in particular , and auxiliary languages in general , pervades int ellectual circles in Britain today , as in many other countries. And if you dare to raise the subject of constructed international languages with a professional translator or interpreter be prepared not just for disdain but outright hostility. Of course pr ofessional interpreters are among the most linguis tically gifted people on the planet, and can't see why the rest of us shouldn't become fluent in half a dozen natural languages in our spare time. ( Not to mentio n the fact that a widespread ad option of Espe ranto , or one of its competitors, would have a seriously negative impa ct on their opportunities for gainful employment. ) Thus Esperanto has become a symbol of lost causes, to be dismissed out of hand by practical folk. Yet those risk junkies busily tradi ng complex derivatives who brought us to the brink of ruin also thought of themselves as supremely practical hard headed folk. It turned out that they were in the grip of a collective delusion whose effects have impover ished us all. Perhaps they have somet hing to learn from vegetarians and Esperanto speakers. In the world of supposedly practical folk today, during an intercontinental recession,

the European Union spends vast sums of money each year on translating thousands of tonnes of docume nts into 23 di fferent official languages. Th e demand for simultaneous interpreters in Brussels, Luxembour g , Stra sbourg and at the UN consistently outstrip s supply. Meanwhile in the UK, cohort after cohort of schoolchildren emerge from secondary education unable to under stand any language other than their own, often after years of instruction in French, German or Spanish. "Never mind," retort the anglophone triumphalists, "English is the international language these day s ." If you really believe that English is an adequa te lingua franca for Europe, let alone the world, try working in a multi national research project. I spent 2 years as the only native English speaker in an EU project, with Englis h as its official working la nguage, and have been scarred by the ex perience. At first glance, this would seem to represent a triumph for the language of Shakespeare and Churchill: our native tongue has conquered the world! Sitting in a meeting, listening to colleagues conversing in Eu ro globish heavily laden with mispron

ounced En glish jargon , trying to understand and make one's self understood, one starts to realize that this is not the triumph of English a fter all. It seems more like a devious kind of linguistic ju jitsu, in which the world takes its revenge for being forced to a ccommodate monoglot English speakers by twisting their language into a barbarous dialect which they find awkward and unfamiliar. Admittedly , English began as a creole, the offspring of a shotgun marriage between Anglo Saxon and Norman French, but it has c ome a long way since then, and I personally am very fond of it. The anglicized pidgin that passes for English as an international language isn't the language I love, and it isn't a very effective medium of int ernational communication either. As it happens ,t he most eloquent exponent of English as a means of communication that I have ever heard was a Hungarian. But most of us have neither the talent nor the dedication to reach such a height in our mother tongue, still less in a foreign language. We do , howe ver, have sufficient ability to achieve communicative competence in Esperanto with in three months; and when we e

mploy it we'll be communicating with others in the same position as ourselves, i.e. second language users. There won't be the fertile soil for m isunderstanding that exists when a na tive speaker instinctively exploits the quirk s of the language or a nonnative speaker makes a small slip of syntax with serious consequences. Why then does Esperanto remain a fringe cult? Why doesn't the EU insist that all children in Europe spend even a single term learning Esperanto? Part of the answer must be that, once you accept the idea of a constructed language, there is always the seductive possibility of doing better. At certain points during a course on Esper anto you will come across a construction (such as using the so called accusative after a preposition to indicate motion) that makes you ask: why did Zamenhof do it that way -surely that wasn't a good idea? If I want to learn Chinese, I may be daunted by the tonal system, or the thousands of unfamiliar characters, but I have to accept them: that's the way it is. But with an artificial language I'm tempted to think "that should be changed" whenever I come across a difficult or unappealing aspect. Esperanto was in several respects superior to Volapuk, and the Idists think than Ido is better is many respects than Esperanto. Not everyone agrees. Jespersen -no mere dabbler, he -believed that Novial wa s better than either. So it goes on. Hundreds, perhaps t housands, of artificial languages have been proposed in the past couple of centuries.

Most never get used in action. In fact, the second most widely used artificial language, after Esperanto, is probably Klingon , which was deliberately designed to sound ha rsh and be hard to learn! Only Esperanto, for all its perceived imperfections, has ever sustained a community of users numbering more than a few thousand for more than a few decades . Other international language projects, apparently more elegant in concep t (e.g. Interglossa, Lingua Franca Nova), have remained on the drawing board. A list of those that have attracted at least some serious attention is given in the Appendix to this essay. Thus, early in the 21st century, we arrive at a situation where Esper anto stan ds as a proof of concept, but has failed to take off. In spelling it approaches the ideal of one character for one phoneme more closely than almost any natural language, consequently it is easy to pronounce from the page. Its grammar is far more r egular than that of most nat u ral languages, consequently it can b e mastered in a month. Its vocabulary contains a large number of roots found in the major European languages, consequently it doesn't impose a forbidding memory load on adult learners -prov ided that their first language is Indo European. Above all, it has demonstrated repeatedly that international meetings can proceed smoothly without banks of interpreters sitting in cubicles and wires leading into everyone's ears. Nevertheless it is genera lly viewed as merely a hobby for cranks. Linguists sneer at it. EU p olicy

makers would rather pour rivers of taxpayers' money into translation agencies and an endless stream of machine translation projects that never quite achieve their desir ed objectiv es than attempt to introduce Esperanto into the workings of the EU . Personally, I believe this situation is highly unsatisfactory. I am motivated to attempt to do something about it for two primary reasons: 1. In today's globalized civilization , the need for a common international medium of communication is more urgent than ever before; 2. The strain placed on English in its role as de facto international language is turning it into a monstrosity. Therefore I intend part of my website to play host to yet another effort to devise a constructed auxiliary language for international communication. I plan to kick off the process and with luck enlist some support. Why should such a quixotic enterprise succeed, when hundreds before it have failed? Well, it might not; bu t there is one advantage that neither Zamenhof nor any of the early pioneers enjoyed, and which none of the more recent interlinguists seem to have exploited -the computer. Take my Word for it! An international language needs (1) a simple o r t h ography

, (2) a regular grammar, and (3) an easily learned vocabulary. Typical interlanguage projects tend to emphasize the first two points but leave the third in the background. Yet choice of lexical units is the most important of the three. It is normal for propo nents of an auxiliary language to claim that its vocabulary is 'international' in some sense but the foundation for this claim is almost invariably subjective. Zamenhof's approach to Esperanto vocabulary building can be described as 'eclectic'. It has bee n said that Esperanto sounds like a Czech speaking Italian. He select ed a motley collection of roots from the Germanic, Romance and Slavic languages of Europe. The effect is not unpleasing, but it is hardly systematic. What he didn't do was employ a clearl y stated method to create a concise but effective core vocabulary, a s Ogden (1937) and Hogben (1943) pointed out long ago. Most subsequent projects are open to the same criticism. When it comes to creating a vocabulary, constructed languages take one of t wo main approaches: Ec lectic , where the designers pick from a variety of linguistic sources, sometimes with a small admixture of completely made up items. Examples include: Esperanto, Novial, Loglan, Unish. Coherent , where the vocabulary is drawn predomin antly from a single source. Examples inc

lude: Latino since Flexione (from Latin), Interglossa (from Greek), Interlingua (from the Romance languages), Lingua Franca Nova (from the Roma n ce languages, apparently using Catalan as a kind of tie breaker). With the notable exception of Hogben's Interglossa (1943), none of these projects paid much attention to word economy, i.e. to establish ing a minimal necessary core vocabulary. Indeed, the Interlingua English Dictionary (IALA, 1951) boasts of having 27,000 entr ies; while the Unish website ( www.unish.org ) has a section soliciting suggested new words from interested readers. In other cases the designers appear to have relied on their intuitions to decide how many and which words were necessary. A Manifesto for Ve getarians, Esperantist s & Other Cute Animals My contention is twofold: firstly, that the world does need an international language; secondly, that it is possible to create a language that is superior for this purpose, in terms of learnability and usabilit y, than either English or Esperanto. 1. Orthography : it is very easy to improve on English in this aspect, and not difficult to improve on Espe ranto, where the accented consonant s are an irritant. Several projects have already shown this, e.g. Lingua Franca Nova. 2. Grammar : English grammar is a minefield for the unwary, and Esperanto also contains some unnecessary pitfalls. Again, ways of improving on

this ha ve already been demons trated by Lingua Franca Nova among other projects. 3. Lexis : Esperanto vocabulary is too large and disorderly , English much more so. It is the third item that is really crucial, and that is where all previous projects have fallen down. I believe the time is ripe for a more systematic approach, with the aid of computer processing .

You might also like