Monday, August 8, 2016

Errors Upgrading to SQL 2016 and using R

After upgrading our SQL 2014 instances to 2016, I was naturally curious to try out some R scripting that is now natively supported. So I went over to Microsoft's tutorials to start getting setup and run my first 'Hello World' R script from Management Studio.

Right away I ran into a few errors and troubleshooting them was difficult. Like many other DBAs,  I have no real R experience and this is a new frontier for SQL Server. When it came to researching the error messages, since the release candidates had been out for a while, there were a lot issues with the same error messages that weren't relevant to the RTM version scattered across blogs, Stack Exchange, and the MS forums. There were a bunch that didn't make sense and still others that didn't work at all for me. It was an amalgamation of a few that ended up being the correct solution.

Here are the steps I ended up finally using to get my upgraded installation working:

1.  Turn on R via sp_configure:
sp_configure 'external scripts enabled', 1;
GO

RECONFIGURE;

To verify this turned on correctly, check for a run_value of 1 in the return of:

Exec sp_configure  'external scripts enabled'  

2. Restart the SQL Server and Launchpad services.

At this point I received the following error when trying to run an R script:

Msg 39012, Level 16, State 1, Line 1
Unable to communicate with the runtime for 'R' script. 
Please check the requirements of 'R' runtime.
STDERR message(s) from external script: 
Fatal error: cannot create 'R_TempDir'

So after some researching I found this was due to there being spaces in the working directory that is by default specified for R. Apparently R doesn't get along well with spaces in the filename for its ExtensibilityData folder.

3. Edit the rlauncher.config file (located in C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Binn) to ensure the WORKING_DIRECTORY= destination has no spaces in it. In my case, I switched it to the root of my data drive:


4. After restarting both services again, a new error appeared:

Msg 39021, Level 16, State 1, Line 1 Unable to launch runtime for 'R' script. 
Please check the configuration of the 'R' runtime. 
Msg 39019, Level 16, State 1, Line 1 
An external script error occurred: Unable to launch the runtime. 
ErrorCode 0x80070490: 1168(Element not found.).

In addition to pointing to the new folder, the original folder's subdirectories must also all be copied to the new location (even though they're probably empty at this point). After moving them, my ExtensibleData folder had these:

5. With a final restart of the services, the Hello World R script finally executed:



I hope this was helpful to anyone else working with R in SQL Server for the first time after an upgrade to 2016. Although Microsoft has included a FAQ for the installation and upgrade of R in MSSQL, it is woefully inadequate considering R is going to be broken for most installations by default.

Friday, June 3, 2016

Forcing a view to return a nullable column for a static value

I came across a weird scenario wherein I had two tables, one essentially a staging table and the other a slightly transformed version of that. However, they are more or less the same and both have a large amount of applications and ORMs tied to them directly. As a first step to remove the ETL process that moves the data from one table to the other unnecessarily, I planned to turn one into a view of the other. 

Due to restrictions, I essentially have to do this in a way that won't cause the ORM database mappings to break - so the view has to perfectly mimic the table - same name, same data types, same null-able status for each column. Naming and data types are easy to adjust, casting or converting where needed. Changing whether a column is null-able or not is a little more complicated, though.

I first started by tackling base columns that are null-able but had been made non-null during the ETL process by wrapping them in an ISNULL:

--Create a test table
CREATE TABLE [dbo].[Sales] (
    [ID] INT NOT NULL
   ,[Date] DATE NULL
   ,[Amount] INT NOT NULL
   ,[Code] INT NOT NULL
   ,[SalesPersonName] NVARCHAR(50) NULL
);
GO

--Use a view to switch from NULL to NOT NULL
CREATE VIEW [dbo].[vw_SalesView] AS
   SELECT ISNULL([SalesPersonName], N'') AS [SalesPersonName_NotNull]
         ,[Date]
   FROM [dbo].[Sales];
GO

--Check the view's output column meta
sp_help '[dbo].[vw_SalesView]';

When we check the output of the view, we see that the [SalesPersonName_NotNull] column is what we expect, NOT NULL:



This is a pretty quick and obvious solution and has been written about a lot, and the inverse works pretty similarly. When we want to turn a non-null column into null-able, a CASE statement is an easy way to achieve this:

--Drop old view
DROP VIEW [dbo].[vw_SalesView];
GO
--Add a column to make [Amount] nullable
CREATE VIEW [dbo].[vw_SalesView] AS
   SELECT ISNULL([SalesPersonName], N'') AS [SalesPersonName_NotNull]
         ,[Date]
         ,CASE 
             WHEN [Amount] IS NOT NULL
                THEN [Amount]
             ELSE NULL
          END AS [Amount]
   FROM [dbo].[Sales];
GO

--Check the view's output columns
sp_help '[dbo].[vw_SalesView]';

Which gives us a null-able [Amount] column, even though the underlying [Sales] table has the column as non-null:



At this point things are going great and most use cases for adjusting the null-ability of a column are covered. However, I came across a strange situation: the view needed to return a static integer value that wasn't from a table, yet needed be null-able to fit with the original table's makeup. The ETL process had just been setting the column equal to 1000 for every single row, so my view should do the same. My first thought was to use a CASE statement with a hard coded logical comparison to fool SQL Server into making it null-able, the opposite of what I had just done above:

DROP VIEW [dbo].[vw_SalesView];
GO
--Use 1 = 1 try to force a NULLABLE column via CASE
CREATE VIEW [dbo].[vw_SalesView] AS
   SELECT ISNULL([SalesPersonName], N'') AS [SalesPersonName_NotNull]
         ,[Date]
         ,CASE 
             WHEN [Amount] IS NOT NULL
                THEN [Amount]
            ELSE NULL
          END AS [Amount]
         ,CASE 
             WHEN 1 = 1 
                THEN 1000
             ELSE NULL
          END AS [SalesGoal]
   FROM [dbo].[Sales];
GO

--Check the view's output columns
sp_help '[dbo].[vw_SalesView]';

But that didn't work, SQL Server knows this will always evaluate to true and therefor leaves the column as non-null:



So next I began to think of a non-deterministic logic that would for all intensive purposes always return true, but still be potentially false, thereby tricking SQL Server while always giving me the desired result:

DROP VIEW [dbo].[vw_SalesView];
GO

--Use a potentially false condition that will always return true with GETDATE()
CREATE VIEW [dbo].[vw_SalesView] AS
   SELECT ISNULL([SalesPersonName], N'') AS [SalesPersonName_NotNull]
         ,[Date]
         ,CASE 
            WHEN [Amount] IS NOT NULL
              THEN [Amount]
            ELSE NULL
          END AS [Amount]
         ,CASE 
             WHEN DATEPART(YEAR, GETDATE()) > 2015
              THEN 1000
             ELSE NULL
          END AS [SalesGoal]
   FROM [dbo].[Sales];
GO

--Check the view's output columns
sp_help '[dbo].[vw_SalesView]';

Lo and behold, this worked!



I now had my static value returning as a null-able column in my view. And while it's pretty safe to assume neither me nor the database will be time traveling back to pre-2016, something about this approach left me a little unsettled. Who knows what obscure bug or sequence of events could end up giving me a bad date value. So what was something more reliable? I thought about using @@SERVERNAME, @@VERSION (how often do you downgrade to SQL 2000 ?!), or some other @@ value, but having to make assumptions or do some string manipulation seemed like an over complication and a route that would lead to reduced readability, making it even harder to discern what I was trying to achieve with this crazy CASE statement.

I finally settled on something that fulfilled my criteria and didn't leave a pit in my stomach: an approach using a built-in function for low cost, no extraneous manipulation to compare values, and astronomically low odds that I would ever accidentally get a NULL value when I didn't want one: GUIDs.

DROP VIEW [dbo].[vw_SalesView];
GO

--Compare GUIDs to ensure a true outcome, but still register as potentially false
CREATE VIEW [dbo].[vw_SalesView] AS
   SELECT ISNULL([SalesPersonName], N'') AS [SalesPersonName_NotNull]
         ,[Date]
         ,CASE 
            WHEN [Amount] IS NOT NULL
              THEN [Amount]
            ELSE NULL
          END AS [Amount]
         ,CASE 
            WHEN NEWID() <> NEWID()
              THEN 1000
            ELSE NULL
         END AS [SalesGoal]
   FROM [dbo].[Sales];
GO
--Check the view's output columns
sp_help '[dbo].[vw_SalesView]';



Since the odds of two generated GUIDs on the same server are extremely low and right off the bat it's obvious what the statement is doing, I think this is the best approach to take for this very specific, strange problem.