Tuesday, August 12, 2014

Informatica MD5 Hash vs SQL MD5 Hash

This is the difference between SQL MD5 hash values and those generated by Informatica. This comes into play when trying to compare hash columns used in type 2 history generated using different tooling across databases.

SELECT
(SELECT CONVERT(VARCHAR(35),HASHBYTES('MD5',CONVERT(VARCHAR(500),'VALUE STRING 1')+'VALUE STRING N'),2)) InfaHash,
(SELECT HASHBYTES('MD5',CONVERT(VARCHAR(500),'VALUE STRING 1')+'VALUE STRING N') )NormalHash

Friday, August 1, 2014

Debugging Unicode Charcter Mapping Issues

Recently I had an issue where I was getting an error in Informatica that a field was truncating but the source value was the same size as the target definition.

I turned out that the source data had smart quotes and the Unicode code page for the SQL Server relational connection did not support smart quotes. However this was in a comment column that was very wide. It was hard to find that as the source of the issue.

1. Open your log
2. Open the bad file output and read it. In my case the bad file explicitly escaped the character incorrectly representing the data.
3. Extract the bad row from your source file and create a smaller test file.  See tool 1.
4. Load the record into a table variable with loose data typing and also into a clone of your table. Look for truncation. Compare the values with a select...except..select query. Start removing columns from the selects to determine what is not loading into your table. For me this did not work as the issue was not my database definition.
5. Run the second tool against your table variable to get an inventory of your characters. Anything weird like smart quotes or cedilla's?
6. Run the bad row file through your debugger, ensuring you write to a table you will not have a PK violation to. Start replacing values in the file and removing sections of long strings until you find the issue.
 

 
TOOL 1: Powershell String Puller
Get-Content -Encoding UTF8 I:\somefile.dat | Where-Object {$_ -match '<Any Word from File Header>' -or $_ -match '<PK FROM ERROR LOG>'}| Set-Content -Encoding UTF8 OutputErrorRecord.dat


TOOL 2: SQL Character Counter
DECLARE @Numbers TABLE (Number int NOT NULL PRIMARY KEY);
WITH digits (d) AS (
SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION
SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION
SELECT 0)

INSERT @Numbers (Number)
SELECT Number
FROM (SELECT i.d + ii.d * 10 + iii.d * 100 + iv.d * 1000 +
v.d * 10000 + vi.d * 100000 AS Number
FROM digits i
CROSS JOIN digits ii
CROSS JOIN digits iii
CROSS JOIN digits iv
CROSS JOIN digits v
CROSS JOIN digits vi) AS Numbers
WHERE Number > 0

SELECT ch COLLATE Latin1_General_BIN2, COUNT(*)
FROM   <SOME TABLE>
CROSS APPLY (SELECT ch = substring(<SUSPECT COL1>, Number, 1)
FROM @Numbers
WHERE Number BETWEEN 1 AND len(<SUSPECT COL1>) )AS x
GROUP BY ch
ORDER BY 2