Using DataTables instead of DataReaders

Wait, what? NDO uses DataTables instead of DataReaders??? Isn't that terribly slow and scales pretty bad? In this post we show, why NDO doesn't need to support DataReaders.

If you use NDO without further precautions, NDO will buffer the result of your query completely in a DataTable. Therefore, if you generate a report from a very large result set, you will need an extremely large amount of memory. In ADO.NET, this problem is solved by generating a DataReader from which one can read one element at a time. This element is then processed and the processing results are written to the output stream.

In this way, the entire operation needs only the memory necessary to process one data row. It should not be forgotten that the DataReader also uses a buffer in which a certain number of data sets are cached. But is it really necessary to perform the read operations so fine-grained? Of course, that is not the case. The best solution is to read a certain number of records in a loop until all the records are processed.

SQL supports this kind of queries with the OFFSET / FETCH clauses. This allows you to skip a certain number of records and fetch a bunch of records after the specified offset. NDO supports this feature with the Skip and Take parameters of a query.

But this leads to a new issue that has to be solved. By breaking down a query in subqueries, the result set can change while we iterate over the subqueries. This problem can be solved using TransactionMode.Pessimistic and IsolationLevel.Snapshot. In order to support the Snapshot isolation level, it should be allowed for the given database using the following code:

ALTER DATABASE IsolationLevels
SET READ_COMMITTED_SNAPSHOT ON
GO
ALTER DATABASE IsolationLevels
SET ALLOW_SNAPSHOT_ISOLATION ON
GO

In order to understand how the IsolationLevel works, please take a look at this code.

class Program
{
    static object lockObject = new object();
    static DateTime dtStart;    static void Main( string[] args )
    {
        PersistenceManager pmMain = new PersistenceManager();

        // Prepare the data
        pmMain.BuildDatabase();
        var newEmp = new Employee() { Name = "Peter", Salary = 1000 };
        pmMain.MakePersistent( newEmp );
        newEmp = new Employee() { Name = "Paul", Salary = 2000 };
        pmMain.MakePersistent( newEmp );
        newEmp = new Employee() { Name = "Mary", Salary = 3000 };
        pmMain.MakePersistent( newEmp );
        pmMain.Save();

        // Run concurrent operations while reading
        // a snapshot

        pmMain.TransactionMode = TransactionMode.Pessimistic;
        pmMain.IsolationLevel = System.Data.IsolationLevel.Snapshot;
        dtStart = DateTime.Now;
        Output( pmMain.Objects<Employee>().ResultTable, "Task 1a" );

        Task task = Task.Run( () =>
        {
            PersistenceManager pm = new PersistenceManager();
            pm.TransactionMode = TransactionMode.Optimistic;
            var emp = new Employee() { Name = "Mirko", Salary = 10000 };
            pm.MakePersistent( emp );
            pm.Save();
            Output( pm.Objects<Employee>().ResultTable, "Task 2a" );
            emp = pm.Objects<Employee>().Where( e => e.Name == "Mirko" ).Single();
            emp.Salary = 11000;
            pm.Save();
            Output( pm.Objects<Employee>().ResultTable, "Task 2b" );
        } );

        Thread.Sleep( 1000 );
        // Read from the same snapshot
        Output( pmMain.Objects<Employee>().ResultTable, "Task 1b" );
        Task.WaitAll( task );
        pmMain.Abort();  // End Transaction
        pmMain.IsolationLevel = System.Data.IsolationLevel.ReadCommitted;
        pmMain.TransactionMode = TransactionMode.Optimistic;

        Output( pmMain.Objects<Employee>().ResultTable, "After all" );
        Console.WriteLine( "Ready" );
    }

    static void Output( List<Employee> emps, string taskName )
    {
        lock (lockObject)
        {
            Console.WriteLine( DateTime.Now - dtStart );
            Console.WriteLine( taskName );
            foreach (var item in emps)
            {
                Console.WriteLine( $"{item.Name} {item.Salary}" );
            }
        }
    }
}

We work with a persistent class with two fields name and salary here. In the first few lines of Main() we create three records with some data. After that we set the TransactionMode and IsolationLevel of the PersistenceManager to Pessimistic / Snapshot. This means: Everything from now on until the next Save() or Abort() call will be run in one transaction. And the isolation level Snapshot makes sure, that each query during this time will be executed against the same Snapshot.

After displaying all current records we start a second task, which will run some default optimistic / ReadCommitted transactions. We add a record and alter the same record. During these operations we display the results. You can see, that the new and altered record is displayed correctly. While this happens, the main task sleeps for a second. After this time, all data will be dumped again (Task 1b). As you can see in the output, the dump contains exactly the same records as the first dump "Task 1a". After ending the transaction with Abort() we dump the records again. Now all inserted and changed records are contained in the resultset. Here are the results of the test program:

00:00:00.0279831
Task 1a
Peter 1000
Paul 2000
Mary 3000
00:00:00.0379771
Task 2a
Peter 1000
Paul 2000
Mary 3000
Mirko 10000
00:00:00.0549667
Task 2b
Peter 1000
Paul 2000
Mary 3000
Mirko 11000
00:00:01.0328228
Task 1b
Peter 1000
Paul 2000
Mary 3000
00:00:01.0388190
After all
Peter 1000
Paul 2000
Mary 3000
Mirko 11000
Ready

Note the time stamps. Task 1b runs after the Task 2x-Outputs. But it shows exactly the same state of the database as it was before.

Now you can put together the techniques shown to a solution. Choose a number of records, you want to fetch with one query. Set the TransactionMode and IsolationLevel properties as shown above. Now iterate over the records needed for your report using the Take and Skip properties of NDOQuery. Process the objects you retrieved and write your output into the output stream. After processing all records (exit condition: Count == 0) end the transaction with Save() or Abort(). Avoid making changes on objects in a transaction on a snapshot. The solution looks like that:

static void MakeReport()
{
    PersistenceManager pm = new PersistenceManager();
    pm.TransactionMode = TransactionMode.Pessimistic;
    pm.IsolationLevel = System.Data.IsolationLevel.Snapshot;
    int pageSize = 2;
    List<Employee> list;
    NDOQuery<Employee> q = new NDOQuery<Employee>( pm );
    q.Orderings.Add( new Query.AscendingOrder( "oid" ) );
    int skip = 0;
    do
    {
        q.Skip = skip;
        q.Take = pageSize;
        Console.WriteLine( $"Offset: {skip}" );
        list = q.Execute();
        foreach (var item in list)
        {
            Console.WriteLine( $"  {item.Name} {item.Salary}" );
       }
        skip += pageSize;
       pm.UnloadCache(); // Objects are cached with weak links, but we don't need the objects further.
    } while (list.Count > 0);

    pm.Abort();  // End Transaction
    pm.IsolationLevel = System.Data.IsolationLevel.ReadCommitted;
    pm.TransactionMode = TransactionMode.Optimistic;
}

Since we have a clustered index on the ID column (which matches to the expression "oid") the report will be sufficiently efficient. If you want to sort by other criteria, create a corresponding index, otherwise it will be inefficient for large tables. The sample program produces the following output:

Offset: 0
  Peter 1000
  Paul 2000
Offset: 2
  Mary 3000
  Mirko 11000
Offset: 4

Well, do we have to mention, that you should use a higher pageSize in real world applications? And don't complain about the inequality of the salaries shown…  ;-)