Ranting about a client ahead.
My current task is to get some data from one of their clients into their database. As far as I am aware, usually they do this themselves, but something is stopping them this time, so they are getting us to do it.
First problem: No one at OurClient remembers where the data is actually supposed to go. Other than the owner/CEO, no one at OurClient has been working there for longer than 18 months, their employee turnover is so high. As a result, I had to work out what went where.
Lots of investigation, and too-ing and fro-ing with OurClient and I eventually worked out what went where.
The source data is 3 (large) csv files as the primary data, and then there are 14 secondary csv files for each primary file of related data.So, that’s a total of 45 files worth of data. The good news is that the primary files all have the same structure, and the related files also all have the same structure, so what I’ve worked out for the first lot can be reused for the rest, with just changes to the appropriate magic numbers.
I run the first of the primary files through the tool, and end up with a 100Mb sql script.
So, the SQL Management Studio on the database server (I must run it on the server, can’t run remotely) can’t cope with 100Mb sql script files, so I need to break them into smaller pieces. In addition, SQL Server can only cope with 1000 inserts in a single action, so they have to be broken up into smaller batches of at most 1000.
Rewrite, re-export, re-copy to the server, re-run. That’s when I discover the data is nowhere near as clean as I was lead to believe, and includes lines that will cause key violation errors. Argh!
So, rewrite again, this time instead of just inserting the data, I have to check each row to see if it already exists, and if it does update, otherwise insert it.
This means the 100Mb primary script is now almost 1Bg of script, split over 88 files.
(Big thank you’s to Dan Findley for pointing out that Powershell scripts would make generating the sql scripts from the csv source much easier, which has saved my sanity.)
I end up having to do a similar thing to the related data files as well, and the 14 files each end up being 9 sql scripts, so 126 files of about 10Mb each.
Running each of the secondary files in batches of 9 (so, each batch is one of the original source files) takes between 30 and 60 minutes. Running the primary files in batches of 10 takes about the same amount of time.
And now I’m dealing with the sanity destroying issues, now that I’ve actually got the sql scripts correct.
Firstly, I cannot get to the database server directly. I have to remote desktop to another one of their machines, and then remote from that machine to the machine I actually need. As a result, every time I need to copy files, I have to copy through the intermediary machine. Thankfully sql scripts compress really, really well, and the zip file tends to only be about 5% of the original size of the files.
Secondly, Remote Desktop automatically logs me out if there is more than 15 minutes of inactivity, discarding whatever I was working on at the time. As a result, I can’t go off and do other things, because I have to make sure there is continual activity on that machine. From the looks of it, I may have found a solution with MouseMove (https://movemouse.codeplex.com/) which automatically moves the mouse for you, and will fool screensavers and the like.
Thirdly, Remote Desktop automatically ends every session 3 hours after it was started, no mater what. So, even with MouseMove, long batch runs are impossible.
The only good thing out of this is that OurClient is paying time and materials, so they are paying for the fact I am sitting here, not really able to do anything else, poking their machine every few minutes (just in case) and ready to restart the remote desktop connection every 3 hours.
And so far, I am about half way through the related files for the first of the primary files. I still have the other 2 primary files and all of the related files for them to go.
This entry was originally posted at http://halloranelder.dreamwidth.org/20473.html. Comments are accepted here or there using OpenID.