How to effectively conceal IDs from a database table in a record set?

sneakyimp

I have an Excel spreadsheet that queries a database with a macro. The database currently returns a collection of records which are parsed by the spreadsheet for data collection purposes. The spreadsheet is subsequently posted back to the server where it is parsed to generate some XML.

The client has informed me that it is not permissible to store certain record IDs in the spreadsheet. They are concerned that people in possession of the spreadsheet may reverse-engineer the structure of the database table which is ostensibly valuable proprietary data.

Long story short: I need to obfuscate or conceal a particular ID delivered to the spreadsheet from a database query such that I can find the original ID when I later submit the spreadsheet back to the server. It occurs to me I can create a simple 2-column table that associates some random alphanumeric string with each ID, but this sounds to me like a simple substitution cipher and therefore not very secure. Every user who downloads the spreadsheet would probably start to recognize familiar alphanmueric sequences.

Can anyone suggest a simple way to better secure these IDs? Please keep in mind performance and security:
Generating the obfuscated IDs must be fast and easy
The obfuscated IDs should ideally be different in every db query
* It must be possible to decipher the original ID from the obfuscated one fairly cheaply too

It sounds to me like I should be using some kind of two-way encryption but I wonder if simple encryption of integer values might be easily cracked too.

By the way, a simple substitution cipher is probably just fine with the client. I just wanted to make my dull work more interesting.

johanafm

You could use a 3-column table, storing

id
random-value
hash(concat(id, random-value))

You could also generate new random values for every new set of data retrieved too.

If you are dealing with simultaneous sets of data being distributed at once, you could add a fourth column which would keep a "data-set identifier". This value would then be the same for all ids retrieved in a single query, such as the time the data is retrieved or a random number. Or perhaps you already have a user session and could use that to id each data set?

hash could be created using any built in hashing function. If you are using SQL server, you could use HASH_BYTES, which allows for md5 and sha512 among others. The random value would then be generated by

ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT)) AS [RandomNumber]

Using rand() produces the same value for all rows in a select (SQL Server), whereas newid() creates a unique value for every row. Do note that newid() returns data of uniqueidentifier type and must be cast accordingly. However, I do not know if subsequent calls to newid() for the same row will keep returning the same result. That is, I do not know if you can re-use NEWID() twice on each row

INSERT INTO gate_table SELECT (
    id
    , ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT))
    , HASH_BYTES('md5', CONCAT(id, ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT))
    [, optionally the "data-set-identifier"]
);

or if you'd have to store the result of the first call for subsequent use. And I don't know if selecting into a variable for later reference in the same row would work or not. I.e. this

SELECT id, @the_rand = rand-generating-expression(), hash(concat(id, @the_rand))