SQL Server Forum / Programming / SQL / July 2008
bulk insert with negative sign after number
|
|
Thread rating:  |
MT - 29 Jul 2008 17:20 GMT Hi Everyone,
I would like to use bulk insert with a text file that contains numbers both negative and positive (OK so what the big deal?). It turns out that the system delivering the text files appends the negative sign behind the number so it looks like this ('|' being my column delimiter):
123|456.78-|90.12 3456-|78.90|1234.5678
Does anyone have any idea if it is even possible to use a bulk insert into numeric data types with the above data file.
Any help is very welcome!! Best regards MT
Cowboy (Gregory A. Beamer) - 29 Jul 2008 17:35 GMT My guess is that you will end up having to massage the data coming in and use something other than bulk insert. unless you want to upload and then put into the proper tables. Format files will not solve this problem, although I do recommend using them with these types of inserts.
If there is a way, I know of none.
 Signature Gregory A. Beamer MVP, MCP: +I, SE, SD, DBA
Subscribe to my blog http://gregorybeamer.spaces.live.com/lists/feed.rss
or just read it: http://gregorybeamer.spaces.live.com/
********************************************
| Think outside the box! | ********************************************
> Hi Everyone, > [quoted text clipped - 14 lines] > Best regards > MT MT - 29 Jul 2008 20:37 GMT Hi Cowboy,
one suggestion in our team was to write a .NET (!) executable to parse and clean up the files. Talk about a massage. I might as well use Integration Services and VisualBasic to do the transformations.
Thanks anyways Cowboy.
Giiiddeeeup, MT
> My guess is that you will end up having to massage the data coming in and > use something other than bulk insert. unless you want to upload and then put [quoted text clipped - 23 lines] > > Best regards > > MT Bob - 29 Jul 2008 17:37 GMT No, you'll have to import it into a staging table then transform it.
> Hi Everyone, > [quoted text clipped - 12 lines] > Best regards > MT MT - 29 Jul 2008 20:32 GMT Thanks Bob but in my case that is easier said than done! See my post a little further in the thread.
Regards, MT
> No, you'll have to import it into a staging table then transform it. > [quoted text clipped - 14 lines] > > Best regards > > MT Aaron Bertrand [SQL Server MVP] - 29 Jul 2008 18:11 GMT Bulk insert to a work table that has varchar for that column instead of numeric. Clean the data, then move it to the real table.
Better yet, fix the source system.
On 7/29/08 12:20 PM, in article 25ABDDAD-6887-4AF7-8E54-C4FA3CD0D47C@microsoft.com, "MT"
> Hi Everyone, > [quoted text clipped - 12 lines] > Best regards > MT --CELKO-- - 29 Jul 2008 19:26 GMT Can you change the text file with an editor before you try to load it? We can write this in SQL, but a real text tool would be soooo much faster.
Aaron Bertrand [SQL Server MVP] - 29 Jul 2008 19:51 GMT > Can you change the text file with an editor before you try to load > it? Ah yes, instead of having a process automatically import 1000 files a day, we should open each one up manually in notepad, and go line-by-line and correct the minus signs. This is probably exactly what the OP is trying to avoid... :-)
MT - 29 Jul 2008 20:27 GMT Aaron, your quite correct because as you suspected the problem is much more complicated (I'm just trying to eat an elephant by cutting it into slices
:-))). Just in case you are interested, here is the background (real life stuff that is almost humorous if it weren't sad):
We are dealing with data files being extracted from SAP tables. The idea is to generate some reports from the data for upperupperupper management. The problem is that we have approximately 400 - 500 tables to deal with and many of them have over 150 columns (if any of you have ever dealt with SAP tables, and here I mean the real DB tables, then you know what I am talking about). Oh and before I forget, the data comes from four different SAP system from subsiduaries. We've mirrored the tables in SS2005 but alas the data woes have just begun...
All (!) the tables will be truncated and refilled regularly (bi-weekly is the current suggestion). Some of the data files coming from SAP have over 50 million records and are about 6 GB in size. The programming of the ABAPs to generate the files has been done in a "faraway land" and getting anything changed in the file generation will take many weeks of waiting time and detailed specifications for some poor SAP programmer then to implement. Sooooo.....
We currently have to deal with a lot of issues in the data files and one of those is that negative signs are tacked on behind the number.
I am trying also to avoid using format files for the bulk insert because for so many tables and columns it is quite a bit to handle. But format files will not solve my sign issue anyways (at least I don't think so up to now).
So one really goofy option is bulk inserting into tables with varchar columns and then generating views to do the transformation work from which I can select off of. I can write some T-SQL to generate the views based on the initial and final table structures. But then I am still confronted with a set of massive inserts, one for each table and these all generate log. Or is there a way around that?
(Please also don't even ask why we aren't doing real data modelling of the business process. The answer is we don't have the time because it is extremely complicated. We are doing end-to-end analysis of an intricate supply chain management process. Actually very interesting stuff if it wasn't for the fact that our management doesn't understand beans about data warehousing or databases. And so we get very little time to implement a solution. Like I said,if it wasn't so sad it would be funny.)
Cheers, MT
> > Can you change the text file with an editor before you try to load > > it? [quoted text clipped - 3 lines] > correct the minus signs. This is probably exactly what the OP is trying to > avoid... :-) Aaron Bertrand [SQL Server MVP] - 29 Jul 2008 20:42 GMT > and final table structures. But then I am still confronted with a set of > massive > inserts, one for each table and these all generate log. Or is there a way > around > that? During population, use simple recovery model, and batch the insertsin sets of 10000 or 100000. This should allow ample reuse of the log space instead of doing it all in one transaction...
MT - 29 Jul 2008 20:55 GMT Thanks Aaron! I think that's gonna cost at least a bottle of good Scotch with my DBA in order to get him to let us change the recorvery model in order to load some tables but if that's the only way...
Cheers, MT
> > and final table structures. But then I am still confronted with a set of > > massive [quoted text clipped - 5 lines] > of 10000 or 100000. This should allow ample reuse of the log space instead > of doing it all in one transaction... Hugo Kornelis - 30 Jul 2008 23:15 GMT >Thanks Aaron! I think that's gonna cost at least a bottle of good Scotch >with my DBA in order to get him to let us change the recorvery model >in order to load some tables but if that's the only way... Hi MT,
Instead of asking the DBA to jeopardize the recoverability of the production database, why not ask him to create special staging database where you can play around, loading the data into a table, massaging it until it's done, all at simple recovery - and then, when you're ready, copy the final data over to the production database which can remain in full or bulk-logged recovery just as it is now.
This reduces risk for the production database, plus it saves you a fine bottle of Scotch (that I'll gladly help you downing <g>).
 Signature Hugo Kornelis, SQL Server MVP My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
MT - 31 Jul 2008 20:44 GMT Hi Hugo,
don't worry. I won't be posting for help on how to down the Scotch. ;-)))
Regards, MT
> >Thanks Aaron! I think that's gonna cost at least a bottle of good Scotch > >with my DBA in order to get him to let us change the recorvery model [quoted text clipped - 11 lines] > This reduces risk for the production database, plus it saves you a fine > bottle of Scotch (that I'll gladly help you downing <g>). Roy Harvey (SQL Server MVP) - 29 Jul 2008 21:13 GMT >We currently have to deal with a lot of issues in the data files and one of >those is that negative signs are tacked on behind the number. > >I am trying also to avoid using format files for the bulk insert because >for so many tables and columns it is quite a bit to handle. But format files >will not solve my sign issue anyways (at least I don't think so up to now). Since you are avoiding format files I can only assume there are delimiters for the columns. With so many tables to manage I suggest someone write a utility that pre-processes a file, parses it based on the column delimiter, and when it finds a column value that is nothing but digits followed by a negative sign moves the sign to the front of the string of digits. It should not have to know anything about the file except how to parse it. I specify the utility to accept input and output file names, and the delimiter (even if that does not change, just to make it more flexible).
When I had to get data from SAP I was fortunate that it was on a small scale and I was in a position to insist on leading signs.
Good luck! I suspect importing will be the easy part. Trying to make any sense of SAP data, now THAT will be a challenge!
Roy Harvey Beacon Falls, CT
MT - 29 Jul 2008 22:11 GMT Hi Roy,
it sounds like experience talking there! Yes a parser is definitely an option but then again so is IS. With Integration Services I can also parse the files and spit them out cleanly, even give them directly to a bulk insert task for that matter. But as you may gather I am trying to avoid going down that route. We don't really have an ETL tool aside from IS. Already asked about DataStage, Informatica and then the geniuses running our IT Dept.: "No! We have standardized around Microsoft, and Integration Services is part of our standards". Boyohboy what a pack of rats....
IknowIknowIknow...I sound so picky. The problem is yet so simple: A negative sign after the number (very old school stuff).
You are correct. We were able to get the ABAP "experts" to give us a nice delimiter for the fileds in the data files. The original version of the files came in a tab separated file. That should ring some bells: plenty of alphanumeric fields in the tables and in the text plenty of tabs. Kinda makes it difficult to say if a tab is real text or a field separator. I have the feeling our "experts" imagine we simply want to collect that data on a hard disk and not really plan on using it. Of course it would never cross my mind to think SAP would make it difficult to extract data. Those good ol'boys have based their ERP on "Open Standards" (bunch of crock as usual).
And again you are right in saying that the interpretation of the data will be a "challenge". Luckily my German is quite good so the column names do actually make some sense to me and I have a couple of really good SAP experts who also know the business process very well and have started guiding us through the myriad of columns. Now we are just hoping our users really know what they want...(as they say: "hope" is the last thing to die...)
So at the moment I have three options (in no specific order):
1) a roll-your-own parser to clean up da mess before a simple bulk insert 2) integration services 3) bulk import with pure target varchars and then "insert into select" with the logic in a view or the select itself (oh and don't forget a bottle of scotch for the DBA when I need to change that recorvery model to simple)
Maybe someone will have another idea still.
Best regards out to Connecticut! MT
> >We currently have to deal with a lot of issues in the data files and one of > >those is that negative signs are tacked on behind the number. [quoted text clipped - 21 lines] > Roy Harvey > Beacon Falls, CT Roy Harvey (SQL Server MVP) - 29 Jul 2008 22:46 GMT >it sounds like experience talking there! A bit, yes. 8-)
>So at the moment I have three options (in no specific order): > [quoted text clipped - 4 lines] >scotch >for the DBA when I need to change that recorvery model to simple) You might be able to combine options 1 and 2. I think it should be possible write the parser using Integration Services. No, that is not my area of expertise, but it is supposed to be more like regular programming than DTS was and DTS had VBA support.
Another angle to keep in mind... I have always design my load routines to first bring the data into staging tables, then use SQL to move the data to the "real" tables. When the scale was large, as in your case, this took the form of a staging database. (Note that it can be easier to convince a DBA to use Simple recovery mode on a staging database.) Advantages of this approach include the fact that the staging tables can be made flexible enough to swallow the garbage that seems to always be part of imported data. Allowing NULLs, not enforcing uniqueness, even using a character column for numeric data, whatever it takes to get it into a table where you validate and fix it. Now good ETL tools might be able to handle all that for you, but it is an angle to keep in mind.
Roy Harvey Beacon Falls, CT
MT - 31 Jul 2008 21:03 GMT Hello Roy and Everyone Else that has been so helpful!
Let me tell you where we are. We've decided to follow the option first bulk insert into staging tables with all columns varchar second "insert into select" good datatypes (bottle of Scotch is gone for the recorvery model!)
(against Joe Celko's advice, sorry! but it was always my feeling in working with several RDBMSs that whenever possible, SQL is usually significantly more performant than most ETL tools, and Integration Services is....well....ok....kind of an ETL tool as well but apparently SS2005 can do the job nicely)
Several of you mentioned this approach (and in fact Bob was nice enough to give a little prototype near the bottom).
We've been prototyping the last two days with our real data and the results are quite impressive in terms of speed. We're still working on things but if you are interested I can post next week some of the numbers as well as the specs of the box we are using. (Here is an example: the table I osted earlier with almost 200 columns, was loaded in the two step method described above, with some 50 Million records, in less than half an hour)
My problem now is that some of our important end-users have caught wind of the resuts and already want to run some reports on the dev environment. :-))))
The two step process also has some nice advantages in that we can do some nice manipulation of data between 1) and 2) if need be.
So let me say thanks to Bob, Roy, Aaron, Hugo, Alex, Joe and all the rest for your support in brainstorming. And sorry if I sounded grumpy but we are under a lot of pressure and I've been working long hours. Finally I see light at the end of the tunnel and I hope it isn't an oncoming train...
Warmest regards, MT
> >it sounds like experience talking there! > [quoted text clipped - 29 lines] > Roy Harvey > Beacon Falls, CT --CELKO-- - 29 Jul 2008 21:18 GMT >> Just in case you are interested, here is the background (real life stuff that is almost humorous if it weren't sad): << Update the resume and Google up IT project failure rates. You will not feel alone.
>> The programming of the ABAPs to generate the files has been done in a "faraway land" and getting anything changed in the file generation will take many weeks of waiting time and detailed specifications for some poor SAP programmer then to implement. << Get it in works now; if we had started on Alaskan oil drilling 5 years ago we would not have today's gas prices. You can buy time by sending specs up the hierarchy for clarification
>> We currently have to deal with a lot of issues in the data files and one of those is that negative signs are tacked on behind the number. << That is a COBOL format; have you talked to the COBOL guys in the mainframe shop? Most of the heavy lifting in IT is still done on mainframes and COBOL. And ABAP is supposed to be COBOL-like -- whatever that means.
>> massive inserts, one for each table and these all generate log. Or is there a way around that? << Turn off logging?
>> we don't have the time .. management doesn't understand beans .. if it wasn't so sad it would be funny. <<
No, it would be typical for a lot of failed projects.
Aaron Bertrand [SQL Server MVP] - 29 Jul 2008 21:25 GMT >>> massive inserts, one for each table and these all generate log. Or is >>> there a way around that? << > > Turn off logging? How do you do that in SQL Server, Joe?
Roy Harvey (SQL Server MVP) - 29 Jul 2008 21:58 GMT > Most of the heavy lifting in IT is still done on >mainframes and COBOL. Most of the heavy lifting in IT is now done in SAP and competitors. If you can't see that you really need to get out in the real world every decade or so. And no, they don't use COBOL.
Roy Harvey Beacon Falls, CT
MT - 29 Jul 2008 22:33 GMT Failed projects. No no no. Not in this thread you don't.
In any case, the definition of failure, or success for that matter, is purely subjective. When the CEO of our company says the re-organization was a "success", who's gonna argue? (15 000 headcounts across several countries: out the door).
(I can hear Rick saying: ...the problems of three little people don't amount to a hill of beans in this crazy world. Someday you'll understand that. Now, now... Here's looking at you kid...)
OK but that's almost politics.
No this is much more about academics. I want to import a bloody negative sign after the number. Is there any way of doing this in SS2005 short of having to sell my soul to the devil?
Oh and yes: we have already neatly packaged the new specs and sent them off to SAP-ABAP-Nirvana Land where the unknowing but abiding SAP-Elves will implement our wishes. Hey X-Mas is just around the corner. With some luck they'll have it in the QA system by then...
> >> Just in case you are interested, here is the background (real life stuff that is almost humorous if it weren't sad): << > [quoted text clipped - 22 lines] > > No, it would be typical for a lot of failed projects. --CELKO-- - 29 Jul 2008 20:54 GMT >> Ah yes, instead of having a process automatically import 1000 files a day, we should open each one up manually in notepad, and go line-by-line and correct the minus signs. << Where did 1000 files per day come from?
I was thinking more like some common scripting language, ETL tools, etc. than a Microsoft desktop product. They will pipeline the data and have very little overhead -- certainly a small fraction of doing this in T-SQL.
Aaron Bertrand [SQL Server MVP] - 29 Jul 2008 21:25 GMT > Where did 1000 files per day come from? Well, if it's more than one, which clearly it is...
> I was thinking more like some common scripting language, ETL tools, Ah, why didn't you say so? Your original recommendation certainly sounded like manually/visually correcting the file.
MT - 29 Jul 2008 20:51 GMT It's an honour. I guess you are THE Joe Celko. I have most of your books and constantly recommend them to my colleagues and family (OK, my wife looks at me kinda strange when I try to explain how great it is to model unbalanced hierarchies in Oracle but that's beside the point :-))))
Thanks for taking the time to look into the thread. Summa summarum: ain't no text editor gonna help me out with this one. See my reply to Aaron further down.
Regards, MT
> Can you change the text file with an editor before you try to load > it? We can write this in SQL, but a real text tool would be soooo > much faster. shuurai11@gmail.com - 29 Jul 2008 19:42 GMT > I would like to use bulk insert with a text file that contains numbers both > negative and positive (OK so what the big deal?). It turns out that the > system delivering the text files appends the negative sign behind the number > so it looks like this ('|' being my column delimiter): Assuming to can't fix the source...
Do your bulk insert into a staging table; insert these columns as varchar. Then use a view to fix and insert into your production table. I would recommend keeping the staging table "as is" so that any disreprencies and be tracked down more easily, and I would advise against doing anything to change the text file itself, because it is harder to reconcile if something goes wrong.
MT - 29 Jul 2008 20:40 GMT Hello shuurai11,
not bad because I loose the format files but then I need views and real inserts to do the rest...which is a lot in our case (see my reply to Aaron).
Thanks, MT
> > I would like to use bulk insert with a text file that contains numbers both > > negative and positive (OK so what the big deal?). It turns out that the [quoted text clipped - 9 lines] > against doing anything to change the text file itself, because it is > harder to reconcile if something goes wrong. Jesús López - 29 Jul 2008 20:14 GMT INSERT INTO SELECT OPENROWSET BULK provide you a way to bulk insert data while performing some king of transformation without a staging table.
Regards:
Jesús López www.solidq.com
> Hi Everyone, > [quoted text clipped - 14 lines] > Best regards > MT MT - 29 Jul 2008 20:44 GMT Hola Jesús,
gracias por tu ayuda. The Problema is that OPENROWSET definitely needs format files (as opposed to bulk insert) and we are dealling with very many tables with very many columns. One table has 190 columns for example. You have any idea what that means...? See my post in reply to Aaron for more details if you are interested.
You know what: I hate SAP.
Saludos, MT
> INSERT INTO SELECT OPENROWSET BULK provide you a way to bulk insert data > while performing some king of transformation without a staging table. [quoted text clipped - 22 lines] > > Best regards > > MT MT - 29 Jul 2008 22:55 GMT So just to make it clear, here is a tiny example of what we're confronted with. Here is the create table for one of the almost 500 tables we'll be suing (I've left out some columns so that the victims remain anonymous). See the numeric fields? Now imagine a delimted data file with stuff like: 123.456-
(Anyone ever read Sartre's Nausea? That's exactlly how I feel: "His blue cotton shirt stands out joyfully against a chocolate-coloured wall. That too brings on the Nausea. The Nausea is not inside me: I feel it out there in the wall, in the suspenders, everywhere around me. It makes itself one with the café, I am the one who is within it.")
CREATE TABLE vbap( mandt varchar(3) NULL, vbeln varchar(10) NULL, posnr varchar(6) NULL, matnr varchar(18) NULL, matwa varchar(18) NULL, pmatn varchar(18) NULL, charg varchar(10) NULL, matkl varchar(9) NULL, arktx varchar(40) NULL, pstyv varchar(4) NULL, posar varchar(1) NULL, lfrel varchar(1) NULL, fkrel varchar(1) NULL, uepos varchar(6) NULL, grpos varchar(6) NULL, abgru varchar(2) NULL, prodh varchar(18) NULL, zwert money NULL, zmeng decimal(17, 3) NULL, zieme varchar(3) NULL, umziz decimal(6, 0) NULL, umzin decimal(6, 0) NULL, meins varchar(3) NULL, smeng decimal(17, 3) NULL, ablfz decimal(15, 3) NULL, abdat datetime NULL, absfz decimal(17, 3) NULL, posex varchar(6) NULL, kdmat varchar(35) NULL, kbver decimal(3, 0) NULL, kever decimal(3, 0) NULL, vkgru varchar(3) NULL, vkaus varchar(3) NULL, grkor varchar(3) NULL, fmeng varchar(1) NULL, uebtk varchar(1) NULL, uebto decimal(4, 1) NULL, untto decimal(4, 1) NULL, faksp varchar(2) NULL, atpkz varchar(1) NULL, rkfkf varchar(1) NULL, spart varchar(2) NULL, gsber varchar(4) NULL, netwr money NULL, waerk varchar(5) NULL, antlf decimal(1, 0) NULL, kztlf varchar(1) NULL, chspl varchar(1) NULL, kwmeng decimal(19, 3) NULL, lsmeng decimal(19, 3) NULL, kbmeng decimal(19, 3) NULL, klmeng decimal(19, 3) NULL, vrkme varchar(3) NULL, umvkz decimal(6, 0) NULL, umvkn decimal(6, 0) NULL, brgew decimal(19, 3) NULL, ntgew decimal(19, 3) NULL, gewei varchar(3) NULL, volum decimal(19, 3) NULL, voleh varchar(3) NULL, vbelv varchar(10) NULL, posnv varchar(6) NULL, vgbel varchar(10) NULL, vgpos varchar(6) NULL, voref varchar(1) NULL, upflu varchar(1) NULL, erlre varchar(1) NULL, lprio varchar(2) NULL, werks varchar(4) NULL, lgort varchar(4) NULL, vstel varchar(4) NULL, route varchar(6) NULL, stkey varchar(1) NULL, stdat datetime NULL, stlnr varchar(8) NULL, stpos decimal(4, 0) NULL, awahr varchar(3) NULL, erdat datetime NULL, ernam varchar(12) NULL, erzet datetime NULL, taxm1 varchar(1) NULL, taxm2 varchar(1) NULL, taxm3 varchar(1) NULL, taxm4 varchar(1) NULL, taxm5 varchar(1) NULL, taxm6 varchar(1) NULL, taxm7 varchar(1) NULL, taxm8 varchar(1) NULL, taxm9 varchar(1) NULL, vbeaf decimal(6, 2) NULL, vbeav decimal(6, 2) NULL, vgref varchar(1) NULL, netpr money NULL, kpein decimal(5, 0) NULL, kmein varchar(3) NULL, shkzg varchar(1) NULL, sktof varchar(1) NULL, mtvfp varchar(2) NULL, sumbd varchar(1) NULL, kondm varchar(2) NULL, ktgrm varchar(2) NULL, bonus varchar(2) NULL, provg varchar(2) NULL, eannr varchar(13) NULL, prsok varchar(1) NULL, bwtar varchar(10) NULL, bwtex varchar(1) NULL, xchpf varchar(1) NULL, xchar varchar(1) NULL, lfmng decimal(17, 3) NULL, stafo varchar(6) NULL, wavwr money NULL, kzwi1 money NULL, kzwi2 money NULL, kzwi3 money NULL, kzwi4 money NULL, kzwi5 money NULL, kzwi6 money NULL, stcur decimal(10, 5) NULL, aedat datetime NULL, ean11 varchar(18) NULL, fixmg varchar(1) NULL, prctr varchar(10) NULL, mvgr1 varchar(3) NULL, mvgr2 varchar(3) NULL, mvgr3 varchar(3) NULL, mvgr4 varchar(3) NULL, mvgr5 varchar(3) NULL, kmpmg decimal(18, 3) NULL, sugrd varchar(4) NULL, sobkz varchar(1) NULL, vpzuo varchar(1) NULL, paobjnr varchar(10) NULL, ps_psp_pnr varchar(24) NULL, aufnr varchar(12) NULL, vpmat varchar(18) NULL, vpwrk varchar(4) NULL, prbme varchar(3) NULL, umref float NULL, knttp varchar(1) NULL, kzvbr varchar(1) NULL, sernr varchar(8) NULL, objnr varchar(22) NULL, abgrs varchar(6) NULL, bedae varchar(4) NULL, cmpre money NULL, cmtfg varchar(1) NULL, cmpnt varchar(1) NULL, cmkua decimal(10, 5) NULL, cuobj varchar(18) NULL, cuobj_ch varchar(18) NULL, cepok varchar(1) NULL, koupd varchar(1) NULL, serail varchar(4) NULL, anzsn int NULL, nachl varchar(1) NULL, magrv varchar(4) NULL, mprok varchar(1) NULL, vgtyp varchar(1) NULL, prosa varchar(1) NULL, uepvw varchar(1) NULL, kalnr varchar(12) NULL, klvar varchar(4) NULL, sposn varchar(4) NULL, kowrr varchar(1) NULL, stadat datetime NULL, exart varchar(2) NULL, prefe varchar(1) NULL, knumh varchar(10) NULL, clint varchar(10) NULL, chmvs varchar(3) NULL, stlty varchar(1) NULL, stlkn varchar(8) NULL, stpoz varchar(8) NULL, stman varchar(1) NULL, zschl_k varchar(6) NULL, kalsm_k varchar(6) NULL, kalvar varchar(4) NULL, kosch varchar(18) NULL, upmat varchar(18) NULL, ukonm varchar(2) NULL, mfrgr varchar(8) NULL, plavo varchar(4) NULL, kannr varchar(35) NULL, cmpre_flt float NULL, abfor varchar(2) NULL, abges float NULL, wktnr varchar(10) NULL, wktps varchar(6) NULL, skopf varchar(18) NULL, kzbws varchar(1) NULL, wgru1 varchar(18) NULL, wgru2 varchar(18) NULL, knuma_pi varchar(10) NULL, knuma_ag varchar(10) NULL, kzfme varchar(1) NULL, lstanr varchar(1) NULL, techs varchar(12) NULL, mwsbp money NULL, berid varchar(10) NULL );
> Hi Everyone, > [quoted text clipped - 12 lines] > Best regards > MT Roy Harvey (SQL Server MVP) - 29 Jul 2008 23:24 GMT Lets see now....
The column names are almost as bad as the ones used by JD Edwards. That originated on the AS/400 where they apparently were limited to something like 6 characters per column name. What made JDE names special was that with only 6 to work with they used the first two for the table name, or at least something equivalent to it. So the columns table F0005 all started with DR (DRSY, DRRT) and so forth, while those for table F01901 all started with AB (ABAN8, ABALKY).
The short varchars should be char. varchar(1) takes up 2 to 3 bytes of storage, char(1) takes just 1. Some people use varchar(10) as the break point, I've tended more toward varchar(6), but how full the actual data makes the column should be taken into account.
The MONEY columns might be better as DECIMAL with appropriate precision and scale. MONEY has issues which you should be able to read about with a bit of googling.
Using FLOAT is problematic. There are cases where it is necessary, but I would want a really good reason for using it. The key is that it is imprecise, especially when there is anything to the right of the decimal point (because internally it is not a decimal point but a binary point!)
It is worth noting that DECIMAL takes up either 5, 9, 13 or 17 bytes, depending on the precision.
Precision Storage bytes 1 - 9 5 10-19 9 20-28 13 29-38 17 You might use this to standardize some of those DECIMAL declarations up to the large end of each precision range. Or not, it is not a big deal.
Roy Harvey Beacon Falls, CT
>So just to make it clear, here is a tiny example of what we're confronted with. >Here is the create table for one of the almost 500 tables we'll be suing [quoted text clipped - 231 lines] >> Best regards >> MT Hugo Kornelis - 30 Jul 2008 23:22 GMT >Lets see now.... Hi Roy,
You forgot to mention some minor points.
Such as, hmmm, let's see -- all columns are nullable -- and hey look! no primary key in sight! -- lots of similar column names with a number at the end (taxm1-taxm9, kzwi1-kzwi6, etc), reeking of repeating groups -- ...
Where's Joe Celko when you reallly need him, eh?
 Signature Hugo Kornelis, SQL Server MVP My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
--CELKO-- - 30 Jul 2008 23:39 GMT >> Where's Joe Celko when you really need him, eh? << Sorry, I was busy strangling kittens and got distracted.
MT - 31 Jul 2008 21:08 GMT Yep, I understand. My waste basket is full of strangled kittens. Mixed in there as well are a couple of "managers" and "decision makers" but after a while I can't make out the difference any more between kitttens and the rest...
> >> Where's Joe Celko when you really need him, eh? << > > Sorry, I was busy strangling kittens and got distracted. MT - 31 Jul 2008 21:13 GMT Hi Hugo,
regarding the null and PK that's simply my fault. I just wanted to show one of the various tables we are dealing with. I left those out on purpose.
But regarding the names of the columns, you selected the right verb: reek.
But how can you argue with a multi-billion company like SAP? The produce that kind of stuff and they are still "reeking" in the money.
Cheers, MT
> >Lets see now.... > [quoted text clipped - 8 lines] > > Where's Joe Celko when you reallly need him, eh? Alex Kuznetsov - 30 Jul 2008 15:24 GMT > So just to make it clear, here is a tiny example of what we're confronted with. > Here is the create table for one of the almost 500 tables we'll be suing Suing? Nice choice of words. I feel your pain.
Bob - 30 Jul 2008 11:34 GMT Working, flexible example: /*-- Simple example DECLARE @staging TABLE ( col1 VARCHAR(10), col2 VARCHAR(10), col3 VARCHAR(10) ) DECLARE @target TABLE ( col1 NUMERIC(10,4), col2 NUMERIC(10,4), col3 NUMERIC(10,4) )
INSERT INTO @staging VALUES ( '123', '456.78-', '90.12' ) INSERT INTO @staging VALUES ( '3456-', '78.90', '1234.5678' )
INSERT INTO @target SELECT CASE CHARINDEX( '-', col1 ) WHEN 0 THEN col1 ELSE '-' + LEFT( col1, LEN( col1 )-1 ) END, CASE CHARINDEX( '-', col2 ) WHEN 0 THEN col2 ELSE '-' + LEFT( col2, LEN( col2 )-1 ) END, CASE CHARINDEX( '-', col3 ) WHEN 0 THEN col3 ELSE '-' + LEFT( col3, LEN( col3 )-1 ) END FROM @staging
SELECT * FROM @staging SELECT * FROM @target
SELECT * INTO tempdb.dbo.staging FROM @staging
!!bcp tempdb.dbo.staging out c:\temp\table1.txt -S.\sql2005 -T -c -t"|" --BULK INSERT tempdb.dbo.staging FROM 'c:\temp\table1.txt' WITH ( FIELDTERMINATOR = '|' )
*/
USE _test GO
-- Set up tables with NUMERICs which are already there. IF OBJECT_ID ( 'table1' ) IS NOT NULL DROP TABLE table1 CREATE TABLE table1 ( col1 NUMERIC(10,4), col2 NUMERIC(10,4), col3 NUMERIC(10,4) ) GO
-- OK, we're gonna bulk load for table1. We know the structure for table1, so we can create a copy of it with varchars in tempdb. -- Assuming you have an example text file called c:\temp\table1.txt which contains two lines: /* 123|456.78-|90.12 3456-|78.90|1234.5678 */ IF OBJECT_ID( 'dbo.usp_create_temp_table_copy' ) IS NOT NULL DROP PROC dbo.usp_create_temp_table_copy GO CREATE PROC dbo.usp_create_temp_table_copy
@table_name SYSNAME, @debug BIT = 0
AS
DECLARE @sql VARCHAR(MAX)
SET @sql = 'USE tempdb IF OBJECT_ID ( ''@table_name'' ) IS NOT NULL DROP TABLE @table_name
CREATE TABLE tempdb.dbo.@table_name ( '
SET @sql = REPLACE( @sql, '@table_name', @table_name )
SELECT @sql = @sql + name + ' VARCHAR(50),' FROM sys.columns WHERE object_id = OBJECT_ID( @table_name )
SELECT @sql = @sql + ' )'
IF @debug = 1 SELECT @sql ELSE EXEC(@sql) GO
IF OBJECT_ID( 'dbo.usp_do_bulk_insert' ) IS NOT NULL DROP PROC dbo.usp_do_bulk_insert GO CREATE PROC dbo.usp_do_bulk_insert
@table_name SYSNAME, @filename VARCHAR(500), @debug BIT = 0 AS
DECLARE @sql VARCHAR(MAX)
-- Create the BULK INSERT SET @sql = 'BULK INSERT tempdb.dbo.@table_name FROM ''@filename'' WITH ( FIELDTERMINATOR = ''|'' )' SET @sql = REPLACE( @sql, '@table_name', @table_name ) SET @sql = REPLACE( @sql, '@filename', @filename )
IF @debug = 1 SELECT @sql ELSE EXEC(@sql) GO
IF OBJECT_ID( 'dbo.usp_stage_to_main' ) IS NOT NULL DROP PROC dbo.usp_stage_to_main GO CREATE PROC dbo.usp_stage_to_main
@table_name SYSNAME, @debug BIT = 0
AS
DECLARE @sql VARCHAR(MAX) DECLARE @column_list1 VARCHAR(MAX) DECLARE @column_list2 VARCHAR(MAX) DECLARE @case VARCHAR(MAX)
SET @sql = 'INSERT INTO dbo.@table_name ( @column_list1 ) SELECT @column_list2 FROM tempdb.dbo.@table_name'
SET @case = 'CASE CHARINDEX( ''-'', @name ) WHEN 0 THEN @name ELSE ''-'' + LEFT( @name, LEN( @name )-1 ) END AS @name'
SELECT @column_list1 = ISNULL( @column_list1 + ', ', '' ) + name, @column_list2 = ISNULL( @column_list2 + ', ', '' ) + REPLACE( @case, '@name', name ) FROM sys.columns WHERE object_id = OBJECT_ID( @table_name )
SET @sql = REPLACE( @sql, '@table_name', @table_name ) SET @sql = REPLACE( @sql, '@column_list1', @column_list1 ) SET @sql = REPLACE( @sql, '@column_list2', @column_list2 )
IF @debug = 1 SELECT @sql ELSE EXEC(@sql) GO
EXEC dbo.usp_create_temp_table_copy 'table1' EXEC dbo.usp_do_bulk_insert 'table1', 'c:\temp\table1.txt' EXEC dbo.usp_stage_to_main 'table1'
SELECT * FROM dbo.table1
> Hi Everyone, > [quoted text clipped - 12 lines] > Best regards > MT
|
|
|