Now that Unicode has been explained, the next step is to look at how Unicode affects the DataFlex environment. DataFlex 2021 is fully Unicode, which means that all strings are Unicode, they are UTF-8. But if looking at the entire DataFlex 2021 environment, different encodings may be used in different parts.

This is how it was prior to DataFlex 2021. DataFlex itself, the embedded database and the interface to other databases was OEM. The connectivity kits allowed any encoding in the database. The Windows API and UI are ANSI, which meant conversions between OEM and ANSI were needed. The WebApp Server was already using UTF-8 in prior versions.

Much has changed with DataFlex 2021. The items indicated with dark green are Unicode. Most items are now UTF-8. The Embedded database, however, is still OEM. The DataFlex core API has a layer to automatically convert between UTF-8 and the OEM embedded database and older versions of the connectivity kits. The connectivity kits that come with DataFlex 2021 do the conversions between the UTF-8 DataFlex string and whatever type of string is used in the database, such as VarChar and NVarChar. String types can be set from the table editor in the Studio. Windows APIs are UTF-16. ANSI version can still be used, but it is best to start using the UTF-16 versions, and the standard packages do that. Other external APIs can use any encoding.

The DataFlex language string is no longer OEM; it has become UTF-8. If the use of an OEM string is desired, which is unlikely, then the standard DataFlex string can no longer be used for that. There are a couple of special things about the DataFlex UTF-8 string type that are important to know.

First, if hashed strings are used, for example for passwords, beware that if there are characters outside the ASCII set, a hashed OEM string from previous DataFlex versions can be different from a hashed UTF-8 string. Second, in the past, DataFlex strings were sometimes used to store binary data. This technique is not recommended. Instead, use UChar arrays.

The string functions and the debugger now try to interpret the strings as UTF-8 data. Binary data, in a lot of cases, is not valid UTF-8. String functions no longer translate their parameters directly to memory offsets. Now they interpret them as character offsets where they will analyze the string to convert them to memory positions. This will go wrong when the data are not valid UTF-8 strings. An added advantage of using UChar arrays is that the Max_Argument_Size does not apply to them.

As mentioned in lesson 4, there are several essential code changes that must be made because the DataFlex string has become UTF-8. An example is converting TYPE structures to Structs and removing or changing ToOem and ToAnsi calls. If not already done, please review lesson 4. In the source code, Unicode characters in string literals and comments can be used, but not in names of object, functions, and variables. As shown in lesson 2, the Studio editor saves sources as UTF-8.

Demonstration

What results when trying to store Unicode characters in the embedded database

The Order Entry application, which runs on the Embedded database, will be used for this demonstration.
The application is run, and an inventory record is opened.
Then some Unicode characters are entered and saved.
Upon re-finding it, the special characters are lost. They are replaced by little squares.
The original value is put back. A database that supports Unicode must be used if the special characters, or any character from a language that is not in the OEM codepage, are to be used.
Using the SQL Conversion Wizard in the Studio, the database will be converted to MS SQL. The first step to do this is to create an empty database in MS SQL, and then to setup the connection using the SQL Connection Manager. This step has already been completed for the demonstration
Next, select the SQL Conversion Wizard from the menu.
Click ‘Next’ and ‘Next.’ Shown is the, ID of the connection that I already set up. Select ‘Next.’ On the following screen, select all tables.
Hit ‘Next’ or ‘Ok’ on the following option screens and finally ‘Finish’ to complete the process to converting all of the tables to the SQL database.
The Order Entry database in MS SQL is refreshed by selecting F5. It now shows that all the tables have been added.
The application is recompiled and run. Now when Unicode characters are entered and saved, the correct characters are shown.
If Unicode characters are to be supported in an application, the embedded database can no longer be used. All of the other databases supported by DataFlex properly store Unicode.
As shown in the figure, the Unicode Windows API is UTF-16. This means that conversions must be made between the DataFlex UTF-8 string and UTF-16.
External APIs currently used in applications might be UTF-8. If that is the case, conversions do not need to be made because they can be called directly using the DataFlex string.
If an application uses a third-party component that is Unicode, but is OEM or ANSI, then they can still be used, but special characters are not supported. Utf8ToOem or Utf8ToAnsi functions will need to be used.
To fully support Unicode, a Unicode version of the DLL needs to be obtained by either downloading it, or, if the source is available, creating a Unicode version.

Migrating to DataFlex 2021

DataFlex courses

Information

Migrating to DataFlex 2021 Part 2

Lesson 7: Unicode in the DataFlex environment

Demonstration