Home Links

Ways of storage of the dynamic data

Construction of multilingual applications is very many-sided problem which has no the uniform effective decision and means an individual approach in each concrete case. In the given material I assume to state the sights at possible{probable} variants of job with a dynamic multilingual content.

Some words about a static content


Undoubtedly, besides a dynamic content, is also static. What shall we attribute{relate} to those? First of all - messages of our scripts: it can be any error messages (for example, the user has incorrectly filled in the form), also, we can give to the user any helps, there is still a certain typing, kotory is deduced otherwise and which does not vary.


One of ways which I, in particular, use is sheaf PHP and GetText. You are more detailed about it can read here: http://php.russofile.ru or in 12 issue PHPInside (http://phpinside.ru). There are some more ways of processing of a static multilingual content, but they are not object of given clause{article}.

Some words about various codings


It is a lot of codings, say, for Russian it is possible to name 3 the most widespread: DOS (866), windows-1251, KOI8-R. The same situation, probably, is fair and for other languages, therefore some standardization here is necessary. In our material we shall store{keep} all language resources in coding UNICODE, namely - in UTF-8. Also various manipulations with codings will be considered.


It is necessary to remember, that when we speak about a variety of languages, we do not count English and all languages, whose symbolical range gets under the coding latin1. The matter is that latin1 the part is present at all codings and is always, and, hence, sense to count its{her} separate language no. The example, Russian coding windows-1251 comprises also latin symbols, thus of any code conversions to do{make} it is not necessary.


The prologue on it is finished also we pass to the basic part of our narration.

Components of a dynamic content


What can be a dynamic content? These are those materials of your site which are subject to partial or full change with a sufficient degree of periodicity: news, clauses{articles}, votings and so on. Probably, that the structure of these data is non-uniform, both on volume of the information, and on a degree of dynamism. Also, we should not overlook that fact, that separate dynamic elements of a site can be the general{common} for all his  language versions, and any - are unique for each variant.


Thus, you see, that the given problem very multiplane, and, hence, there are some ways of the decision.


Let's enter, for the best understanding, some terms (them I have thought up, therefore do not expect here over the precise and smooth formulation).


Unique content - this term we shall name a content which is unique for each language version of a site.


The general{common} content - this term we shall name a content which is actual for all language versions simultaneously.

Reasonings on types of a content


Before to reflect on ways of storage of a content, it is necessary to define{determine} for itself, what content at you is unique, and what - to the general{common}.


Let's see, what it is possible to attribute{relate} to a unique content?


As a rule, language versions of a site seldom completely coincide with each other. The tree of sections of a site often differs and and is significant. Hence, names of sections at us will be unique and we shall store{keep} them separately for different language versions of a site. Separateness of storage at all oznachae, that we shall use different databases, it will be simple for storage of each element at us the recording in a database - how much elements - so much recordings. And only two-letter index (for example, ' ru ',' en ',' pl ', etc.) will signal us to what language version of a site this recording concerns.


As the tree of sections at us is unique for each site, means also pages with a content also will be unique, and, hence, with storage of these data at us also problems will not be.


What can concern to the general{common} content? For example, illustrations on which any text, say, is not put to a photo. In fact at a conclusion to different language versions of a site it is necessary for illustrations to give any description. It can be both votings, and the goods of your shop, and so on. And all these data simultaneously can be submitted at once on all language versions. That is we name "general{common}" a content which practically without changes is shown on all language versions of a site.


Here different difficulties and a variety of approaches about which we shall talk below also begin.

How to store{keep} a unique content


We use files


Uniqueness of this content that he is not obliged to be present at all language versions of a site at all.


If it is an information site which each page is volumetric enough it is logical to store{keep} such data in usual files: 1 file - one page. Thus we with you avoid mess in language versions (at such approach each page will have unique identifier), and also we give simplicity of editing of such files.


It also facilitates to us process of job with codings - in fact we save with you the text how he is and there is no necessity to reflect on his  format.


Such way of storage of the information in the best way approaches, in my opinion, for great volume of the information though quite it is possible to store{keep} in files, for example, a news line.


We use a database


With databases as you understand, all is more complex . Here before us in all breadth there is a problem of codings and overlapping of the different language data.


Fortunately, on the majority khostingovykh platforms for today the databases, normally perceiving format UNICODE are established, in particular, coding UTF-8, and it means, that, up to the certain degree, the our problem will be solved easier.


The first, that we for ourselves determine, so it that at storage raznojazykovykh the data we shall use coding UTF-8. First, it will allow us creation of set of tables for storage of the homogeneous data will avoid, second, all language data in the certain degree are unified, and it means, that with them it will be easier to work. In particular, that fact is important, that search on our resources will be organized much easier.


As we now speak about a unique content his  storage in base a little than differs from storage in files. We create the table which will comprise all necessary fields, we translate our content in UTF-8 if he yet was not in this coding, and we place in base standard means. On each element of a content we will have 1 recording in base.


To translate a content from any coding in UTF-8 it is possible as follows:

$string = iconv (BROWSER_CHARSET, ' UTF-8 ', $string);



Here constant BROWSER_CHARSET contains the name of the coding in which your data will be displayed, for example Windows-1251. UTF-8 - specifies in what coding we want to receive result, and a variable $string contains a content which we translate from one coding in another.


If to change the first and second argument places, receive return result.


Iconv is expansion PHP, check up, whether it is established at you. If you work with a static content with help GetText, iconv at you is established. If no - that address to clause{article} specified in the first part of this material, installation iconv there is described. On the majority met by me khostingovykh platforms this utility is established, so any problems at you will not be.


All this is good, but to apply such design each time on a course of a script a little bit tiresomely and inefficiently, let's write 2 functions which will make this procedure.

define (' BASE_CHARSET ',' UTF-8 ');


$GLOBALS [' site_languages '] = array (

'en' => array (' English ',' UTF-8 ',' en_US'),

' ru ' => array (' Russian ',' UTF-8 ',' ru_RU ')

);


function StringToBase ($lang, $string) {

    if (BASE_CHARSET <> $GLOBALS [' site_languages '] [$ lang] [1]) {

        $string = iconv ($GLOBALS [' site_languages'] [$ lang] [1], BASE_CHARSET, $string);

}

    return $string;

}


function StringFromBase ($lang, $string) {

    if (BASE_CHARSET <> $GLOBALS [' site_languages '] [$ lang] [1]) {

        $string = iconv (BASE_CHARSET, $GLOBALS [' site_languages'] [$ lang] [1], $string);

}

    return $string;

}



Let's understand, that do{make} our functions. As follows from their name, the first prepares lines for base, and the second prepares lines for their conclusion in a browser.


Besides functions I have resulted one more constant and a file. For what they are necessary for us?


Constant BASE_CHARSET sets the coding in which we shall store{keep} the information in base (in our example - UTF-8).


The file $GLOBALS [' site_languages '] comprises the data on each language: ' a key ' - a two-letter combination of each language (for his  identification), we shall use it  everywhere where it is necessary to designate language of the data; and values: [0] - the full name of language, [1] - the coding in which the information will be deduced{removed} in a browser and [2] - the identifier of language for utility GetText.


These data will be necessary for us constantly therefore we have issued them so that they were accessible in each function and in all classes of our application.


Now let's see, that do{make} directly our two functions.


By a call of each of functions as parameter we pass the KEY (see the description of a file $GLOBALS [' site_languages']) - a two-letter combination for language on which the text in $string is written. The second parameter is a text line in the language designated in $lang.


First of all functions check - instead of whether the coding of a database coincides with the coding in which we with you deduce{remove} the information on the screen. If coincides, anything to do{make} it is not necessary - the data and so in the coding necessary to us, for example, in this case the coding of base and the coding of the data will coincide. If they do not coincide, we make code conversion of the data. After function will fulfil, she will return ready result for job. An example of job of functions:

$string = StringToBase (' ru ', $string);


$string = StringFromBase (' ru ', $string);



Both in that and in the other case in a variable $string there are necessary data (prepared) to you. It is necessary to place only them in base.


THE REMARK:


Nezabyvajte to create in base the field - index on language where wear out a two-letter combination of language on which your message is written.

The general{common} content


The multilanguage data of small volume


For the beginning I suggest to pay the attention to volume of the text which to us is necessary for saving. For an example we shall take the image. As a rule, the description of an illustration is a small text on volume, in fact we shall write it  in atl HTML tega <IMG>, or simply to sign an illustration. Usually for this purpose suffices 100 - 255 symbols. If to assume, that our site speaks in 5 languages the total amount of the text will as much as possible make 1,5 kilobytes. In this connection, we make a decision to store{keep} all this in base in one field such as TEXT.


Certainly, we somehow should separate the data for different languages from each other. For this purpose we shall develop the decision on development of the certain format of a data storage in this field.


I offer the next way:

ru: = Russian text ~~ en: = English text



We have chosen the certain combination of signs which vrjadli can meet in the usual text. Each language line will be anticipated a combination of the KEY (a two-letter combination of language) and signs: =, and messages will be divided{shared} by a double tilde (~~). After the line is made out by the specified way, we send her  in a database.


What advantages are given us with such way of a data storage? First, the data in any language at us always near at hand the opportunity of search, and search we, second, is saved can conduct by means of a database and at once in all available languages as the language data at us are not deformed and are in the accessible form.


Now let's write 3 more functions which will facilitate to us formation of the "packed" line of the specified format, and also will provide ease of data acquisition from the "packed" line. In the job these functions will lean{base} on first two which are given above as the final result should satisfy to the requirement, that the result should be completely ready to the use.


The first function. Its{her} problem  to accept a file of the certain format and to generate on his  basis a line of the specified format.


In the given example one more constant will appear is DEFAULT_LANG. The given constant defines{determines}, what language is accepted for language poumolchaniju, that is, in what language will "speak" a site if the user at present has not specified the language.

function LangDataPack ($data) {

// We set an empty line - here there will be a result of job

$result = ";

// We receive the list of keys from a file $GLOBALS [' site_languages ']

$languages = array_keys ($GLOBALS [' site_languages']);

// We organize a cycle by quantity{amount} of languages,

// Supported by a site (it is determined by quantity{amount} of keys)

foreach ($languages as $lang) {

// If $result it is not empty, means we went to the next language

// Also it is necessary to separate language given by a combination ' ~~ '

if (! empty ($result)) {

$result. = ' ~~ ';

}

// If there is a necessity, convert a line in the coding of base

$data [$lang] = StringToBase ($lang, $data [$lang]);

// We check, whether there is in our file a line in the language accepted

// poumolchaniju if no - we shall return FALSE (mistake)

if (empty ($data [$lang]) ** DEFAULT_LANG == $lang) {

return FALSE;

// If the text no, and the given language is not language poumolchaniju,

// We bring an empty line

} elseif (empty ($data [$lang])) {

$data [$lang] = ";

}

// We form a line of result, adding to her the data in the current language

$result. = $lang. ': = '. $data [$lang];

}

return $result;

}



As parameter this function accepts a file of the following kind:

array (' lang1 ' => ' string ',' lang2 ' => ' string ')



Where lang1 and lang2 is a two-letter combination of language on which it is written string (for example, array (' en ' => ' english ',' ru ' => ' Russian ')). As a result of job of function you receive a line in coding UTF-8 of a kind en: = english ~~ ru: = Russian.


As a result of the job function will return to you a line of the necessary format or a mistake. The mistake will arise when you have passed functions a file of lines which does not contain the data in language poumolchaniju. The matter is that for normal functioning a site it is necessary, that khotjaby one of variants of the language data has been filled, differently there can be a situation when the user will not see anything.


And so, function which forms "packed", a line is written and has fulfilled, you have obtained the data necessary to you and can easy place them in base. At such approach to mark recording in base a two-letter code of language it is not necessary, potomuchto recording will contain at once all language data.


There can be a situation when you have suddenly added one more language in the project. From the further examples you will see, that the similar way of a data storage will not call any difficulties with addition of a modern language.


We pass to the following function. Its{her} problem  - to restore the "packed" file which was processed with the first function.

function LangDataUnPack ($data) {

// We set an empty line - here there will be a result of job

$result = ";

// In the beginning we create a file with the list of the language data

// For this purpose we cut a line on the labels left earlier (~~)

$data = explode (' ~~ ', $data);

// We organize a cycle on the basis of quantity{amount} of the elements selected above

for ($i = 0; $i <count ($data); $i ++) {

// Now we separate actually language data

// From a two-letter code of language

$temp = explode (': = ', $data [$i]);

// If necessary we make change of the coding of the text

$result [$temp [0]] = StringFromBase ($temp [0], $temp [1]);

}

return $result;

}



As parameter our function accepts the "packed" line. As a result of its{her} job you receive a restored file which format is described above.


And, at last, the third function which will allow to obtain quickly and effectively the necessary language data, thus she will provide some service. Service functions will consist that in a case when you request a line in language on which her  did not enter, she will return to you a line in language poumolchaniju (he to us and was useful). That is, if the text in English is necessary for you, language poumolchaniju at you is Russian, and the line in English no - you receive a line in Russian. Thus, addition of a modern language in the project will not call failures of a site. Simply temporarily, while you will not enter a line on a modern language, they will be given out in language poumolchaniju.

function LangDataString ($lang, $data) {

// We set an empty line - here there will be a result of job

$result = ";

// In the beginning we create a file with the list of the language data

// For this purpose we cut a line on the labels left earlier (~~)

$data = explode (' ~~ ', $data);

// We organize a cycle on the basis of quantity{amount} of the elements selected above

for ($i = 0; $i <count ($data); $i ++) {

// Now we separate actually language data

// From a two-letter code of language

$temp = explode (': = ', $data [$i]);

// If the current language - language poumolchaniju,

// We save the language data

if (DEFAULT_LANG == $temp [0]) {

// In case of need, we change the coding

$default = StringFromBase ($temp [0], $temp [1]);

}

// If the current line in language which is necessary for you,

// We save the language data

if ($lang == $temp [0]) {

// In case of need, we change the coding

$result = StringFromBase ($temp [0], $temp [1]);

}

}

// We check contents of the language necessary to you

// If it is empty (the language data no),

// We appropriate{give} the language data of language poumolchaniju

if (empty ($result)) {

$result = $default;

}

return $result;

}



As parameter function accepts all the same "packed" line from base. In result you receive the message in the language necessary to you in the necessary coding.


THE REMARK:


It is very important to remember, that if you deduce the information in a browser in the coding distinct from UTF-8, last function will give failure in the event that language by default at you not English. It will take place because the coding, say, Russian can not coincide with the Chinese or Spanish coding - in such coding simply there are no Russian symbols. In that case you should provide somehow a similar variant and decide, that for you in this case it will be better. You can always define{determine} language poumolchaniju anglijsikj, thus problems with codings will not arise (see item{point} « Some words about various codings »).


As it is marked above, opportunities of search also are saved. Search in such data you can LIKE command. Certainly, it is necessary to limit the search text to symbols of %, for example, ' %poisk % '.


The multilanguage data of great volume


In the previous part we have considered with you opportunities on storage of small volumes of the multilingual data. Let's see, what variants at us are for storage concerning the big data.


First of all, let's closely see at the data. We shall say, for an example, we shall take with you news, suppose, that they at us appear synchronously on all sites. Of what she can consist?

?         id - the unique identifier;

?         date_start - date of the publication;

?         date_stop - date after which achievement, news is cleaned{removed} from the main page and is located in archive;

?         visibe - an attribute of visibility of news on a site;

?         subject - a subject news (heading);

?         short_body - a short variant of news;

?         full_body - a full variant of news;


For our researches of such structure will be enough. Abundantly clearly, that at each language version of a field id, date_start, date_stop and visible will be identical and we do not have any necessity of them to duplicate. We have some variants of the decision of a problem.


The first variant - to create in the table, alongside with the general{common} fields, separate fields under each language version. This approach has one lack - at addition in the project of a modern language we shall collide{face} that, that we should expand our table, adding to with her new fields. Not always it can be made without serious consequences and there can be separate problems, the same problems will arise at removal{distance} of any language from the project. Besides, it is simply inconvenient. And what to do{make}, with news which have been added considerably before addition of a modern language and, hence, texts to which for the given language are absent? Also there can be difficulties with search (in the event that you will try to search at once in all language versions of a site) - search will give significant loading on the server and, it appears, will be slow enough, potomuchto instead of 1 field it is necessary to process KH*kollichestvo languages of fields. Certainly, any problem can be solved, but I suggest to go in other way.


In the second variant I suggest to divide{share} our news into two tables. The first table will contain shared data, and the second - language.


Let's see, as our tables will look:

1 table (table1)

news_id

news_start

news_stop

news_visible


2 table (table2)

news_id - unique number{room} of news from the first table

news_lang - the unique two-letter identifier of the language data

news_subject - a subject (heading) of news

news_short - short news in language with a code from news_lang

news_full - full news in language with a code from news_lang



Thus, on each recording from the first table we can have some recordings in the second table. All these recordings will have identical news_id, but different identifier of language news_lang.


In result, to us addition of any language in our project as tables do not depend in any way on quantity{amount} of languages is completely informidable.


Let's see, how it is possible to work with this sheaf of tables.


Let's try to choose news to the main page of Russian version of a site.

SELECT * FROM table1, table2

WHERE

news_visible = ' Y '

AND

news_start <= NOW ()

AND

news_stop> NOW ()

AND

news_lang = ' ru '

AND

table1.news_id = table2.news_id

ORDER BY

news_start DESC



I have allocated with a fat font the basic connecting elements.


NOW () in this case means today's date, for example, you can receive her  following PHP the command: date (' Y-m-d ').


Thus, if you have just added the Chinese version to the project, but have not made translation of news for it of the version, anything will not be deduced{removed}, as in base there will be no recording with an alphabetic code ' zh ' (it is a code of Chinese language).


In spite of the fact that we now with you spoke about news, it will be fair to any dynamic content of your site, whether it be news, voting, or something another. The main thing, that to you is necessary to make, it to allocate of structure of a content of a part, which are the general{common} (identical to all language versions) and actually language data.


Search according to such structure is organized very simply because each recording contains unique number{room} of shared data (news_id) and having made search under the text of news you always will know this text to what news belongs. Besides search will conduct only on one field for all language versions of a site.


For the greater presentation let's try call with you on editing Russian variant of news with number{room} 10.

SELECT * FROM table1, table2

WHERE

table1.news_id = 10

AND

table2.news_id = 10

AND

news_lang = ' ru '



And now we shall call on editing all language variants of news with number{room} 10.

SELECT * FROM table1, table2

WHERE

table1.news_id = 10

AND

table2.news_id = 10



I think, that examples are rather evident. Further, I suggest you to experiment.

Let's sum up


In this clause{article} I have mentioned problems of storage of a multilingual dynamic content. We svami have learned to divide{share} a "unique" dynamic content and "general{common}", have learned{have found out} about several ways of storage of the multilingual data, and also have considered problems of job with various codings.


I hope, that this material will be useful to you and will help with development of your applications.


I wait for any questions and offers concerning given clause{article}. If you with something disagree or naooborot, consider, that it is necessary to expand clause{article} on any questions, - write, I am glad to each letter





Home Page
Compression of pages on PHP
Functions of a paginal conclusion in PHP
PHP a script for automatic definition of the coding of the text:
XML Sapiens - magic of revival of sites
XML: opportunities and prospects
The data, their performance and forms of the user interface in XML
It is a little about OPML
Protection of mail
Importation of the information from an another's site on the site in the design
The forum working with database MySQL
The main page of a forum
Paginal conclusion from tables MSSQL with help PHP
Small cunnings
We print rationally or one more way of protection of the information
Site from within
The link - business not artful
Ideology HTML
Effect bukvicy
Ways of storage of the dynamic data