Monday, December 5, 2016

Maintaining Sinhala (Unicode) characters with SQL Server Database

This is one of the questions I received last week on Unicode characters. I was asked the way of maintaining Sinhala characters in one of the tables because they have been trying with nvarchar data type but it has not worked as expected.

Let's try to understand the way of maintaining different characters such as Sinhala and Tamil. If we need to maintain such characters, usually with char or varchar, we should instruct to SQL Server to use specific code page and corresponding characters. The instructions can be passed using Collation. Not only it detects the code page and set relevant characters, it decides the sorting rules, case and ascent sensitivity of our data.

The COLLATE clause can be specified at Server Level, Database Level, Column Level or in an Expression (example, with ORDER BY). SQL Server supports Windows Collations, Binary Collations, and SQL Server Collations.

Note that not all collations related your language are available with SQL Server Collation set. If the required one is not available, then you can use Windows Collation set. However, some of Windows Collations cannot be set at Server Level or Database Level.

Collations for languages like Sinhala and Tamil are not exist with SQL Server Collation Set. Therefore, we have to use Windows Collation and no way of setting it at Database Level.

To see all Windows Collation, here is the page: https://msdn.microsoft.com/en-us/library/ms188046.aspx

As per the above page, we have to use Indic_General_100_ for Sinhala characters. We can find out all collations related to it using below query;

SELECT * FROM sys.fn_helpcollations();


Let's apply this to a table and see. The following code creates a table with a nvarchar data type for holding Sinhala characters. Note the the collation set for the column. When insert records, we need to make sure that N is used for specifying the the value is Unicode.

DROP TABLE dbo.Test;

CREATE TABLE dbo.Test
(
 Id int identity (1,1) primary key
 , Message nvarchar(4000)  collate Indic_General_100_BIN not null
);

INSERT INTO dbo.Test (Message) VALUES (N'Hello World!');
INSERT INTO dbo.Test (Message) VALUES (N'සුභ උදෑසනක්!');

Here is the result if you query the table now.


No comments: