Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Service Broker / December 2005

Tip: Looking for answers? Try searching our database.

40KB blocks

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
William Stacey [MVP] - 26 Nov 2005 00:47 GMT
Roger said messages over 40KB are divided into 40KB blocks.  Does that mean
exactly 40960 bytes?  So if I send a byte[] of 40960 bytes, it will be sent
in exactly 1 message?  TIA

Signature

William Stacey [MVP]

Kent Tegels - 27 Nov 2005 07:07 GMT
Hello William Stacey [MVP],

> Roger said messages over 40KB are divided into 40KB blocks.  Does that
> mean exactly 40960 bytes?  So if I send a byte[] of 40960 bytes, it
> will be sent in exactly 1 message?  TIA

I thought it was 4K (4096) to align the normal packet size for TDS.

Anyway, there's a enough metadata that needs to be sent that you shouldn't
expect to consume 100% of the packet size with the first message part.

Thank you,
Kent Tegels
DevelopMentor
http://staff.develop.com/ktegels/
William Stacey [MVP] - 27 Nov 2005 16:04 GMT
Thanks Kent.  It may be a typo, but page 30 says:
"To avoid monopolizing the network with a very large message, large messages
are split into 40KB fragments and then assembled on the receiving end."

I read that as the Body is split into 40KB chunks.  The TDS would also
fragement per its protocol.  It does not ~sound like the meta data is
counted in that figure.  Not enouph detail to really know for sure.  But I
would like to know.

Signature

William Stacey [MVP]

> Hello William Stacey [MVP],
>
[quoted text clipped - 10 lines]
> DevelopMentor
> http://staff.develop.com/ktegels/
Kent Tegels - 27 Nov 2005 16:39 GMT
Hello William Stacey [MVP],

> Thanks Kent.  It may be a typo, but page 30 says:
> "To avoid monopolizing the network with a very large message, large
[quoted text clipped - 5 lines]
> counted in that figure.  Not enouph detail to really know for sure.
> But I would like to know.

I don't have Roger's book with me (I'm in London this week), and the Holy
Book is silent on this topic. I'll ask around. :)

Thank you,
Kent Tegels
DevelopMentor
http://staff.develop.com/ktegels/
Roger Wolter[MSFT] - 28 Nov 2005 02:31 GMT
40K is an approximate value and is not guaranteed to remain the same between
releases.  Why would you care what the fragment size is?  The point is that
large messages are fragmented for more fair bandwidth utilization.  The
exact size of a fragment shouldn't matter from a programmer's point of view.

Signature

This posting is provided "AS IS" with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

> Roger said messages over 40KB are divided into 40KB blocks.  Does that
> mean exactly 40960 bytes?  So if I send a byte[] of 40960 bytes, it will
> be sent in exactly 1 message?  TIA
William Stacey [MVP] - 28 Nov 2005 02:48 GMT
Example would be large data file transfer.  If I "chunk" the file, I want to
pick the best chunk size.  41KB would be a bad choice as one message would
be 40KB and another one would be 1KB.  So two messages for each Send.  Is
this thinking in error?   TIA

Signature

William Stacey [MVP]

> 40K is an approximate value and is not guaranteed to remain the same
> between releases.  Why would you care what the fragment size is?  The
[quoted text clipped - 5 lines]
>> mean exactly 40960 bytes?  So if I send a byte[] of 40960 bytes, it will
>> be sent in exactly 1 message?  TIA
Bob Beauchemin - 28 Nov 2005 15:53 GMT
I agree with Roger (a safe position ;-). Why does it matter? It's for
fairness when large messages are sent. Because binary adjacent broker
protocol uses TCP, it may also be subject to the fragmentation done by the
underlying TCP stack. And you don't have the ability to choose your
chunksize, at least not that I'm aware of. What's nice about the broker
protocol is, unlike MSMQ there is not an arbitrary (small) message size
limit, making your programs more complex, because you're doing the chunking
and unchunking in application code. The procotol takes care of this.

The only possible reason that it might matter (and this is contrived) is if
you're writing a trace module and want to know how large to allocate the
largest buffer, if you're tracing on a per-fragment basis.

The TDS API buffer size has changed over time (it once defaulted to 1k) so
there shouldn't be any need for the broker protocol to align with it. Again,
at least that I'm aware of.

So what am I missing?

Cheers,
Bob Beauchemin
http://www.SQLskills.com/blogs/bobb

> Example would be large data file transfer.  If I "chunk" the file, I want
> to pick the best chunk size.  41KB would be a bad choice as one message
[quoted text clipped - 10 lines]
>>> mean exactly 40960 bytes?  So if I send a byte[] of 40960 bytes, it will
>>> be sent in exactly 1 message?  TIA
William Stacey [MVP] - 28 Nov 2005 21:20 GMT
Thanks Bob.  Please see reply to Roger.

Signature

William Stacey [MVP]

Roger Wolter[MSFT] - 28 Nov 2005 16:34 GMT
You don't need to chunk a message unless it is bigger than 2GB.  I don't see
the advantage of doing fragmentation both in your application and in the
Service Broker.  If you're really determined to do this then pick a size you
know will always be less than the fragment size - maybe 30KB.

Signature

This posting is provided "AS IS" with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

> Example would be large data file transfer.  If I "chunk" the file, I want
> to pick the best chunk size.  41KB would be a bad choice as one message
[quoted text clipped - 10 lines]
>>> mean exactly 40960 bytes?  So if I send a byte[] of 40960 bytes, it will
>>> be sent in exactly 1 message?  TIA
William Stacey [MVP] - 28 Nov 2005 21:19 GMT
You want to load 2GB arrays in memory on clients and also load 2GB array on
server?  This may work for 1 user, but what if many hundreds of users
uploading files?  That will kill the server quick - no?  Unless I am missing
something.  A streaming method with chunks would seem to be much more
efficient and keeping the chunks in multiples of the SSB max chunk size
would seem to be the most efficient.

Signature

William Stacey [MVP]

> You don't need to chunk a message unless it is bigger than 2GB.  I don't
> see the advantage of doing fragmentation both in your application and in
[quoted text clipped - 15 lines]
>>>> mean exactly 40960 bytes?  So if I send a byte[] of 40960 bytes, it
>>>> will be sent in exactly 1 message?  TIA
Kent Tegels - 28 Nov 2005 23:45 GMT
Hello William Stacey [MVP],

> You want to load 2GB arrays in memory on clients and also load 2GB
> array on server?  This may work for 1 user, but what if many hundreds
> of users uploading files?  That will kill the server quick - no?
> Unless I am missing something.  A streaming method with chunks would
> seem to be much more efficient and keeping the chunks in multiples of
> the SSB max chunk size would seem to be the most efficient.

There's an opinion you didn't ask for...

Ummm... You get the idea that its a messaging service, right? You know, here's
a bit of data, work it, and let me know when your done so I can ask the next
process to start. Send messages not object graphs -- that sort of thing.
Its not meant to be secured, transacted FTP. :)

Given that, how many 2GB payloads are you really likely to get? Sure, I can
see there being some collection of data that in total measures two gigs.
But its probably just that, a collection. So wouldn't it be possible to send,
oh, what about 69000 32k messages instead? With Service Broker, I'd have
to think that's totally possible. And since the Workgroup+ SKUs can multithread
queue readers, it shouldn't take nearly as along to process that heap as
as you might guess.

And let's not forget that the data in the database is supposed to be both
complete and duration after ever write. Well, if you're streaming in, are
you willing to lock that row or table while you're writing to it? Even at
best speed, we're talking several seconds to commit that kind of data. Not
something SQL Server is designed to do.

If you really need to load files into SQL Server, look a SIS intead or the
Bulk Provider. Probably much and easier to implement.

Thank you,
Kent Tegels
DevelopMentor
http://staff.develop.com/ktegels/
William Stacey [MVP] - 29 Nov 2005 03:55 GMT
> Ummm... You get the idea that its a messaging service, right?

Yes

>You know, here's a bit of data, work it, and let me know when your done so
>I can ask the next process to start. Send messages not object graphs --  
>that sort of thing. Its not meant to be secured, transacted FTP. :)
> Given that, how many 2GB payloads are you really likely to get?

<G> I don't expect too many 2GB payloads.  However, this is an interesting
technical point.  But lets take it down to a realistic example.  Say MSN
Music (or production level file trans between insurance companies and
service providers).  SB would seem *extremely well suited for this task
IMHO.  As we have a mix of accounting, transactions, and large byte[]
transfer.  FTP would not be a good option (imo) as it falls on the floor
during an error and you need to create the recovery logic yourself and all
accounting would be out-of-band with the data - or you have to write a lot
of custom code or buy something and modify it.  MTOM in Indigo and WSE is
not good either, as you can't stream on the client side and server side
seems strange and all byte[]s are still "normalized" to base64 in the
internal pipeline.  So you have the same problem, loading an entire message
in memory *and waiting for the entire message before writing it out to disk.
This is not a good design by any account as we could be streaming the file
(in chunks) in a background thread and getting foward progress.  Even with
today's RAM, not sure a design that loads 5-10MB arrays in memory as buffer
is good; as a better alternative exists.

SB, on the other hand, has all the right stuff.  It has reliability built
in, recovery, send once, transactions, conversation, security, scales, etc.
How many 5MB messages could you load and have running on the server?  How
about 10MB files?  At what point is the file too big?  You would need to
make some arbitrary cut off.  But if we stream, we don't need to make up
some arbitrary upper boundry (aside from physical FS store limits).  You
definitely want to chunk things like this and SB seems to allow this
perfectly with ordered sets of messages in a conversation.  But you want to
*help it be efficient about the machine resources.  Like I said, you don't
want to send a 41MB message if the max size is 40MB as you will get n*2
messages on the wire (one 40MB and one 1MB).  So you want a multiple of 40,
or some block size less then 40.  So it does matter IMO.  Buffer size and
fragmentation always matters for scalable server apps.  In fact, if you are
a google or msn music, you probably align buffer sizes from the app buffers
down to the tcp level.

> Sure, I can see there being some collection of data that in total measures
> two gigs. But its probably just that, a collection. So wouldn't it be
> possible to send, oh, what about 69000 32k messages instead? With Service
> Broker, I'd have to think that's totally possible.

I agree and that is what I am saying.  You would chunk it into multiple
messages.  As even with 3-30GB RAM on a server, why load a 2GB message in
one message?  Maybe when we all have 1TB RAM in our servers, it will seem
small.  But even then, your wasting IO opportunity to get that data written
to disk using a stream as your not writting until the whole message is
loaded.

> And let's not forget that the data in the database is supposed to be both
> complete and duration after ever write. Well, if you're streaming in, are
> you willing to lock that row or table while you're writing to it? Even at
> best speed, we're talking several seconds to commit that kind of data. Not
> something SQL Server is designed to do.

Right.  So you write in chunks for this reason as well.  I would probably
read/write from files on server and store file system pointers in the db,
but either way.  There is something attractive about having all data in the
same "unit of recovery", but that is another topic and not related to this
post per se.  Thanks Kent.

Signature

William Stacey [MVP]

Kent Tegels - 29 Nov 2005 15:10 GMT
Hello William Stacey [MVP],

> <G> I don't expect too many 2GB payloads.  However, this is an
> interesting technical point.  But lets take it down to a realistic
> example.  Say MSN Music (or production level file trans between
> insurance companies and service providers).  SB would seem *extremely
> well suited for this task IMHO.  As we have a mix of accounting,
> transactions, and large byte[] transfer.

I just don't see why, but maybe I will be the end of your this missive.

Actually if you're talking about claims data, don't the massive files usually
represent a number of individual transactions? There's a claim, the claimant
and the of CPTs -- Please process it. Or, this claim was paid or rejected,
and here's why -- deal with it. In that scenario, why wouldn't you'd be better
off to write a client side parser that queues each item as a message into
the broker? It may be 2gb of total data, but at point, you need to decompose
that data somehow to process it.

Why store anything in SQL Server unless you're going to process it using
the features it brings to bear (or for backup and restore, and I don't really
think you want to bloat your tlogs that as much as this sort of thing would.)

> FTP would not be a good
> option (imo) as it falls on the floor during an error and you need to
> create the recovery logic yourself and all accounting would be
> out-of-band with the data - or you have to write a lot of custom code
> or buy something and modify it.  

Well, if you've only used IIS's sad FTP server, I guess I can see that. There's
other FTP servers out there that do restarts nicely that's don't cost more
than a finger.

Having a database transaction depend on the successful network transfer of
a file seems of folley to me. And even if so, why could you have a message
exchange like "please notifiy me when you get this file, which is at this
address, and confirm with this code/guid/trackingID" and then the client
side actually has the file, send back a mesage saying "I've gotten it, here's
your confirmation." Then your transactional scope can be maintained by seperated
from the file transfer process.

> MTOM in Indigo and WSE is not good
> either, as you can't stream on the client side and server side seems
[quoted text clipped - 6 lines]
> 5-10MB arrays in memory as buffer is good; as a better alternative
> exists.

I don't so little about Indigo and WSE that'll I talke your word for it.
It shouldn't be hard to write SQLCLR code that FTPs (even with restarts)
a file from a know address to a local FS. That could be done by a service
program that gets activated by a "go get this file request" and sends back
a delivery message. So I still say that the request/transfer/confirm pattern
is better than transactional messaging megablobs.

> SB, on the other hand, has all the right stuff.  It has reliability
> built in, recovery, send once, transactions, conversation, security,
> scales, etc. How many 5MB messages could you load and have running on
> the server?  

At least thousands if not millions.

> How about 10MB files?  At what point is the file too big?

There's ecosystem to consider here: As soon as it causes a TLOG file to expanded
under heavy workload. As soon as I have to back up both the at least some
part of the file twice to expense tapes because I'm backuping it up not only
in the table but also in the TLOGs. As soon as a it locks a table for the
duration of the write that it didn't need too.

> You would need to make some arbitrary cut off.  But if we stream, we
> don't need to make up some arbitrary upper boundry (aside from
> physical FS store limits).  

Sure, if SQL had a Streamable type -- ideally something that was ISequentialStream
friendly -- no problem. Of course, making that type ACID would be the trick.

I've tried implementing a UDT that inherits from MemoryStream and ran into
a problem: its limited to 8000 bytes of storage. Hardly worth streaming into.
:)

> You definitely want to chunk things like
> this

I agree that I'd want to transfer the files in a streaming manner, preferablly
in chunks. I don't agree I'd want to do it with SSB.

> and SB seems to allow this perfectly with ordered sets of
> messages in a conversation.

and it seems imperfect for doing for a number of other reasons, like the
lack of type to persist in, the issue with TLOGs, etc. To me, FTP seems like
an an ideal way of doing that.

> But you want to *help it be efficient
> about the machine resources.  Like I said, you don't want to send a
> 41MB message if the max size is 40MB as you will get n*2 messages on
> the wire (one 40MB and one 1MB).  

If you have 41k of data to send, you have 41k to send. Some chunk is going
to be odd sized anyway no matter how nicely you try to normalize the stream.

> So you want a multiple of 40, or
> some block size less then 40.  So it does matter IMO.  Buffer size and
> fragmentation always matters for scalable server apps.  In fact, if
> you are a google or msn music, you probably align buffer sizes from
> the app buffers down to the tcp level.

Yes, but these applications aren't SQLClients are they?

Of course, this whole conversation might be moot. If Service Broker drops
BBAP in favor of something WCF based or simply makes it an option, you should
be able to implement you own channelling for it, then you can do what you
like

> I agree and that is what I am saying.  You would chunk it into
> multiple messages.  As even with 3-30GB RAM on a server, why load a
> 2GB message in one message?  Maybe when we all have 1TB RAM in our
> servers, it will seem small.  But even then, your wasting IO
> opportunity to get that data written to disk using a stream as your
> not writting until the whole message is loaded.

So if I write something that does the chunking for you, can we done with
this? :)

> Right.  So you write in chunks for this reason as well.  I would
> probably read/write from files on server and store file system
> pointers in the db, but either way.  There is something attractive
> about having all data in the same "unit of recovery", but that is
> another topic and not related to this post per se.  Thanks Kent.

Then I like my "get this, ftp, got it" pattern better. You can do that same
work without passing the file as a message payload.

Thank you,
Kent Tegels
DevelopMentor
http://staff.develop.com/ktegels/
William Stacey [MVP] - 29 Nov 2005 22:11 GMT
Not sure why your fighting me on this, but here it goes:
We are talking about two things.  One is the message size and RAM, and
second is the app design and using SB.

- Message size
a) When would it ever make sense to send one big message (say over 5MB)
instead of chunking it?  Almost never IMO.
b) first it takes up huge amount of server RAM - which does not scale.
c) you can't write out any data until you receive the whole message, so your
wasting disk IO you could be doing.
d) If you fail half way into the large message, you have to start the whole
message over again.

- App design / protocol
a) I can think of many ways to do this without SB, but SB seems perfect.
b) I have a stream solution too using varbinary(max) column here:
http://spaces.msn.com/members/staceyw/Blog/cns!1pnsZpX0fPvDxLKC6rAAhLsQ!404.entry
but that is not what I am talking about.  The file could go to DB or disk -
I don't want to get hung up on that point for this.
c) sure you can get ftp to work, but it is not clean and you have to play in
two ball fields all the time.  You have multiple open ports and you have two
separate security channels that you probably need to sync somehow.  You need
a message channel and have security and encryption on that, and you have an
ftp/data channel that has separate security (and the two don't talk to each
other).  And if you want security and encryption, you have to get SFTP and
require clients have it too.
d) As I said, Indigo and WSE are not a good fit yet for streaming large data
because of the MTOM issues.  Sure we can use sockets and create a custom
solution, but now we are back to doing everything yourself (security,
reliable, encryption, etc.)

So SB allows both your binary data (in this case) and your message channel
to be neatly in the same message stream and programming in .Net on both
sides.  You can send some xml message to start the request, and then flow
messages back and forth as needed.  So all your logic can stay in your SP
and supporting classes.  You also need only one inbound port, and it works
via NATs, etc, as the client starts the dialog and it is TCP based.  You got
integrated security in *one spot and can use custom table security and/or
Windows security.  You can use encryption.  You can compress your messages
with little effort in-stream.  You have ready access to all table data to
log auding, history, credits/debits, state, email, etc.  It handles the
retries for me.  I mean it is darn near perfect.  I can do all this in one
place and have a discreet client and server model without have to integrate
a bunch of other tech such as ftp or sockets or wse, etc.  Am I wrong?

Signature

William Stacey [MVP]

> Hello William Stacey [MVP],
>
[quoted text clipped - 135 lines]
> DevelopMentor
> http://staff.develop.com/ktegels/
Kent Tegels - 30 Nov 2005 00:46 GMT
Hello William Stacey [MVP],

> Not sure why your fighting me on this, but here it goes:
> We are talking about two things.  One is the message size and RAM, and
> second is the app design and using SB.

I'm sorry if I've offended you. I'm not fighting you, just engaging in that
I think is a good discussion for the community to have. If you want to see
fighting, get Adam and I liquored up and set us off on Knowledge vs. Data. :)

Its not as simple as message size and RAM. If you want to use Service Broker,
you have to live in its environment. At means paying attention to things
like the TLOGs, like the other work the CPU is doing and how to best use
its schedule. For instance, it very well might more sense to take advantage
of Service Brokers ability to process many messages concurrently by sending
more rather than fewer messages. Yes, you're giving it more work to do, but
it can parallel work very effectively if its not otherwise busy. No need
to "single thread it unless you have to."

> - Message size
> a) When would it ever make sense to send one big message (say over
> 5MB)
> instead of chunking it?  Almost never IMO.

To answer your question first: It wouldn't make sense not to chunk it.

How much sense does it make to send a 5mb message in the first place? Almost
never, IMHO.

Alot of this discussion could easily be avoid if SQL Server has a VarBinaryStreaming(MAX)
data type (or if VarBinary(Max)) that offered a ReadStream/WriteStream method.
Frankly, I want  the FileStream datatype back if nothing else. The my message
payload could simply be simply be a network address, port and item identity
right?

By the way, my suggestion of using messaging to invoke FTP gets you the majority
of that.

> b) first it takes up huge amount of server RAM - which does not scale.
Depends on the volume or work, but sure, fine.

> c) you can't write out any data until you receive the whole message,
> so your wasting disk IO you could be doing.

Depends on what's getting committed to the TLOGs for it. Depends on what
other work the server is done. In fact, you might be freeing disk IO for
other threads to use. You surely don't believe that somebody is going to
spend SQL Server type of cash just to move files about do you?

> d) If you fail half way into the large message, you have to start the
> whole message over again.

Non-plus. The same problem exists regardless of message size.

> - App design / protocol
> a) I can think of many ways to do this without SB, but SB seems
> perfect.

Here's why its not.

a.) Between backups, you've double disk footprint of the bytes: the committed
ones are in the TLOG and in the database.
b.) A means you're burning up twice the backup media space too, and taking
longer to do that.
c.) What happens if you also use snapshot isolation and you happen to get
write queries against the rows that contain the file? Guess what, you've
also just created a copy of the file's bytes in tempDB. Twice the footprint
again.
d.) If the SQL Server isn't dedicated to this task, you're asking SQL to
spend time out of its thread pool to give this work attention. That means
fewer cycles and longer run time for other queries.
e.) You've probably got more of a memory impact than expected since we'd
be taking memory of of SQL's startup allocation of some % of the available
machine memory rather than total system memory. And yeah, that means that
plans and query results get kicked out the cache sooner futher degrading
SQL overall performance.
f.) If you're transmitting the messages between different instances of SQL
Server, you're now spending additional resources to encrypt the payload (by
default) when you may or may not need to.

I'm pretty sure the list can go on.

> b) I have a stream solution too using varbinary(max) column here:
> http://spaces.msn.com/members/staceyw/Blog/cns!1pnsZpX0fPvDxLKC6rAAhLs
> Q!404.entry
> but that is not what I am talking about.  The file could go to DB or
> disk - I don't want to get hung up on that point for this.

You have to. While the file parts are in the message payload, they are in
the database, pure and simple. And you want to roll file out due to a transaction
failure, your either going to keep in the database or, at a minimum, keep
a reference to its location.

> c) sure you can get ftp to work, but it is not clean and you have to
> play in two ball fields all the time.  You have multiple open ports and you
> have two separate security channels that you probably need to sync somehow.

Ugh. I wish every developer had to spend three years as an SA. The only thing
there that's even in remotely challenging is security and that's largely
thanks to MS's daft implementation with AD for IIS's FTP.

FTP requires one port to be opened but can be changed to anything fairly
easily.  Most organizations already have it open if they are trading files
(I challenge you to find a medical claims TPA that doesn't, for example).
BBAP defaults to port 4022, but can be addressed anything else. So, even
by default, its a minimum of one port and a maximum of two. On the security
side, broker uses certificates for both a&a and message encryption. SFTP
uses a certificate for channel encryption. All you really need to have an
FTP user account with a fixed password and you're done. Guess what, you can
use same certificate for both needs.

> You need a message channel and have security and encryption on that

Uh, yes, That's what ABP and DP do.

>, and you have an
> ftp/data channel that has separate security (and the two don't talk to
> each  other).  And if you want security and encryption, you have to get
SFTP
> and require clients have it too.

They don't have to talk to each other at all. Overengineering the solution.

As far as setting up an SFTP server goes, see: http://digitalmediaminute.com/article/1487/setting-up-a-sftp-server-on-windows
As far as CLR usable SFTP library goes, see: http://www.nsoftware.com/ipworks/ssh/technologies.aspx?sku=IHN7-A
One login with a fixed password, you're done.

Before you balk about the price tag, consider this: you're probably paid
pretty well. That means you're an expensive resource to tie up writing code.
Why spend one second longer than breakeven writing code that you can buy?

> d) As I said, Indigo and WSE are not a good fit yet for streaming
> large data because of the MTOM issues.  Sure we can use sockets and create
a
> custom solution, but now we are back to doing everything yourself (security,
> reliable, encryption, etc.)

That I agree with.

> So SB allows both your binary data (in this case) and your message
> channel to be neatly in the same message stream and programming in
> .Net on both sides.

Agreed.

> You can send some xml message to start the
> request, and then flow messages back and forth as needed.  So all your
> logic can stay in your SP and supporting classes.

But you don't have to write ANY code with my solution if you don't choose
to. Sure, you may choose to have the receving Service do the FTP work, but
with a good library that's pretty much cake. And all of your logic stays
in the Assemblies.

> You also need only
> one inbound port, and it works via NATs, etc, as the client starts the
> dialog and it is TCP based.

So, one port or two. If that's the only consideration, you will. Its not.

> You got integrated security in *one spot
> and can use custom table security and/or Windows security.  

While it can, its better off to use certificates instead since there's the
remote service may not always be hosted in a trusted domain. That's the design
of Service Broker Security, see paragraph four of "Service Broker Security
Overview" in BOL.

> You can use encryption.  

You can here too.

> You can compress your messages with little effort
> in-stream.

Eh? You could compress the file before sending, or could compress the chunks,
but how do you compress the stream? Or is that not what you mean?

>  You have ready access to all table data to log auding,
> history, credits/debits, state, email, etc.  It handles the retries
> for me.  I mean it is darn near perfect.  I can do all this in one
> place and have a discreet client and server model without have to
> integrate a bunch of other tech such as ftp or sockets or wse, etc.
> Am I wrong?

The biggest thing I think you're missing is that while it would be be prefect
in if done on a dedicated server and network, that's a pipe dream. You're
also talking about reinventing a perfectly good wheel, working I just don't
see the value in it.

Why am I so persistent in this? Because I've tried something like this before.
Granted, I didn't have the luxury of service broker, but it wasn't hard to
cruft something similar up. About a week into the coding of it, I stopped
because I realized it made more business sense to go the other direction.
As a result, that client now has a system they can actually maintain with
having me around because they already knew how to do FTP reasonably well.

YMMV.

Thank you,
Kent Tegels
DevelopMentor
http://staff.develop.com/ktegels/
William Stacey [MVP] - 30 Nov 2005 04:46 GMT
> I'm sorry if I've offended you. I'm not fighting you, just engaging in
> that I think is a good discussion for the community to have.

Not at all Kent.  Did not mean to infer that.  Enjoy the discussion.  I will
have to reply tomorrow as need to get some sleep :)  Cheers.

Signature

William Stacey [MVP]

William Stacey [MVP] - 02 Dec 2005 18:46 GMT
> For instance, it very well might more sense to take advantage of Service
> Brokers ability to process many messages concurrently by sending more
> rather than fewer messages. Yes, you're giving it more work to do, but it
> can parallel work very effectively if its not otherwise busy. No need to
> "single thread it unless you have to."

But in this case, sequence and dialog is what we are after.  We expect to
process the control msgs and byte stream (multiple messages) in order.  I
don't want to read and write block 4 before block 2.  Otherwise I am
creating my own sync system again and I need to stream the file out from
byte 1 to N in filestream anyway.  You could probably come up with some
system to parallel multiple blocks at same time and handle out of order
reads and writes, but then your threads are just fighting with themselves
for locks and that system gets very complicated.  Take upload for example.
You can't get faster then 1 thread in a tight loop reading msgs off the
queue as fast as possible.  Multiple threads don't help here (in context of
this 1 user session).  Writing to disk on the tail end gets more interesting
because now we are blocking on disk IO.  So if I take the simple approach, I
understand that blocking can waste me some opportunity to be popping more
reads while I am waiting for blocking write.  However, I gain a much simpler
model that is much easier to reason about and verify correctness.   If I
choose to async the server writes (to allow for multiple queue readers) then
I need state machine(s) and have to reason about order of blocks and
callbacks, etc.  And not sure how this might be done in a sproc.  You could
probably carry around offset and len with each message, so you could write
multiple blocks at the ~same time, but then your also thrashing the disk
(for none sequencel writes) and not sure ultimately how much would be gained
here as so many factors involved and all your writers will still be
contending for writes to the same file stream.  Both methods would need to
be designed and perf tested.

> Alot of this discussion could easily be avoid if SQL Server has a
> VarBinaryStreaming(MAX) data type (or if VarBinary(Max)) that offered a
> ReadStream/WriteStream method. Frankly, I want  the FileStream datatype
> back if nothing else. The my message payload could simply be simply be a
> network address, port and item identity right?

That would be nice.  I have wondered why Stream is not a first class citizen
in the sql world.  I would think that abstractions over nvarbinary(max)
would be the thing to do and leverage the type.

> Depends on what's getting committed to the TLOGs for it. Depends on what
> other work the server is done. In fact, you might be freeing disk IO for
> other threads to use. You surely don't believe that somebody is going to
> spend SQL Server type of cash just to move files about do you?

Well, consider that I am not trying to use SB as a generic ftp service, but
to develop (in the abstract) an end-to-end application solution.  Like I
said, this could be music/video/dvd purchase (think future here),
production/model/test file mover, etc.  Anything that requires moving bytes
around in the context of application business logic.  In the end, everything
is a byte stream.  So it does not matter if your talking files, xml string,
some dynamic message, it all comes down to moving byte streams and doing
something with them on both ends.

>> d) If you fail half way into the large message, you have to start the
>> whole message over again.
>
> Non-plus. The same problem exists regardless of message size.

True.  But it is faster to resend 30K message then loose a 1GB message and
have to resend the whole msg from the beginning.

> a.) Between backups, you've double disk footprint of the bytes: the
> committed ones are in the TLOG and in the database.

Does the tlog never get purged once they are committed in the db?

> b.) A means you're burning up twice the backup media space too, and taking
> longer to do that.

Valid point.  And would be a vital system design point.  On the other hand,
things like incremental/diff backups can be your friend or mirrored DB.  I
mean there are some solutions in this regard - no?

> c.) What happens if you also use snapshot isolation and you happen to get
> write queries against the rows that contain the file? Guess what, you've
> also just created a copy of the file's bytes in tempDB. Twice the
> footprint again.

Could partitions help any here?  Not sure.

> d.) If the SQL Server isn't dedicated to this task, you're asking SQL to
> spend time out of its thread pool to give this work attention. That means
> fewer cycles and longer run time for other queries.

Naturally.  We always have to address this issue.  To not complicate things,
I am thinking just in terms of this "turn-key" system.  Naturally, you could
leverage some existing machines/instances, but need to look at scale, perf
and make those decisions for your own needs.  I mean there is no free lunch
here.  Even if you created custom socket server service or IIS or FTP, you
still need cpu cycles and memory and locks, etc.  So what is the delta?
Impossible to answer without testing or some prototypes.

> You have to. While the file parts are in the message payload, they are in
> the database, pure and simple. And you want to roll file out due to a
> transaction failure, your either going to keep in the database or, at a
> minimum, keep a reference to its location.

True.  But we are back to no free lunch here.  If I want transactional
guarantees, I can either leverage existing abstractions, such as sql server,
or roll my own.   Myself, I would rather leverage sql as you would need to
reinvent much of that wheel anyway to get the same kind of guarantees.

> Ugh. I wish every developer had to spend three years as an SA. The only
> thing there that's even in remotely challenging is security and that's
> largely thanks to MS's daft implementation with AD for IIS's FTP.

Like yourself and others, I have gotten many of the T-Shirts.  Over the
years got the shirts like: Novell, ATT System V, AIX, Solaris, ksh, bsh,
csh, perl, cobol, c#, vb, c,  tcp/ip, ipx, winsock, gnu, etc.  Developed ftp
client and server, developed dns client and server.  Sys admin and sys
programmer for large IT provider, etc.  Admin'd and installed production ftp
and Connect:Direct solutions between large Unix, mainframes, NT, and so on.
Also hacked together hybrid vb and ftp apps like I "think" you may be
talking about.  Hence my distaste for using ftp like this.

> FTP requires one port to be opened but can be changed to anything fairly
> easily.  Most organizations already have it open if they are trading files
[quoted text clipped - 5 lines]
> need to have an FTP user account with a fixed password and you're done.
> Guess what, you can use same certificate for both needs.

Yes we now have good ftp client libs and can program against them.  So that
is good.  But the server side is still this static beast.  You set it up and
basically manage access using file system ACLs.  So people have a access to
stuff or not.  If you need to dynamically give access based on a purchase,
for example, you stuck.  Yes we can program around this and create wrappers
and one time URLs and maybe even dynamically change ACLs, but man that is
ugly and fragile. Plain ftp is a non-starter because the security issues.
SFTP could work, but your server side is still a black-box service, and you
still have an issue with storing usernames/passwords on the client and
managing two security services (the app service, and sftp access and
permissions).  If I just handle this right in my msg flow (in-band), I don't
have another out-of-band channel to deal with and managing security on
another service.

The two -vs- one port thing is not a huge deal.  However, if I don't need 2
ports, it is just cleaner.  It can also be easier to sell to firewall folks,
then multiple ports.

> They don't have to talk to each other at all. Overengineering the
> solution.

They kinda do.  So I buy File1 all inside a transaction.  My account is
debited only if I get the whole file.  Any errors will roll-back.  Nobody,
except a verified purchaser can have access to File1, nor should they be
able to list or view other files.  One password does not fit all here.

> One login with a fixed password, you're done.

This is security via obscurity.  No password should be imbedded in your
code, nor should all users use the same id and password.  I mean this is
security 101.  Now back to storing and securing passwords on the client.

> Before you balk about the price tag, consider this: you're probably paid
> pretty well. That means you're an expensive resource to tie up writing
> code. Why spend one second longer than breakeven writing code that you can
> buy?

Good point.  I totally agree.  But that is kinda my point as well.  It is
very simple for me to reason about this data in the stream after I have
already done the base security checks, etc.  I am well prepared to handle
the msg stream at that point.  Each file block msg is just as import to me
as any msg in the application.  So we can look at this as handling a stream
of important messages to the application, not just as file transfer per se.

> Eh? You could compress the file before sending, or could compress the
> chunks, but how do you compress the stream? Or is that not what you mean?

Each msg has a byte stream.  I can compress each msg independently of any
other and still decompress it on the other end.

> Why am I so persistent in this? Because I've tried something like this
> before. Granted, I didn't have the luxury of service broker, but it wasn't
[quoted text clipped - 3 lines]
> maintain with having me around because they already knew how to do FTP
> reasonably well.

I will just have to politely disagree.  I have done stuff like this also
with FTP and don't like it for the reasons above.  To me, it is less code to
handle the stream "in-band" then to use another channel such as ftp, socket
service, remoting service, etc.  Yes, if I was just talking about offering
up public files in some role-based security model, then sftp would probably
be great.  But that is not what I have been talking about (and have done a
poor job in explaining that if that is not clear by now.)

Thanks Kent.
--
William
Kent Tegels - 06 Dec 2005 14:50 GMT
Hello William Stacey [MVP],

>> For instance, it very well might more sense to take advantage of
>> Service Brokers ability to process many messages concurrently by
[quoted text clipped - 5 lines]
> to process the control msgs and byte stream (multiple messages) in
> order.  I don't want to read and write block 4 before block 2.

Fine, but I'd argue of if you know its block 4 of 40 and 30k, you know exactly
where in the byte[] it belongs. Simply insert those bits at those locations.
No need to get them in any order.

> Otherwise I am creating my own sync system again and I need to stream
> the file out from byte 1 to N in filestream anyway.  

Uh, what about varbinary(max)'s write method?

> You could
> probably come up with some system to parallel multiple blocks at same
[quoted text clipped - 18 lines]
> for writes to the same file stream.  Both methods would need to be
> designed and perf tested.

True, maybe a block would allow one write a time to varbinary(max) instance,
otherwise, this is all non-plus. The message simply says "insert this many
bytes here" and the service program does it. That might prevent the same
file from being constructed in parallel, but it doesn't not prevent many
files from being constructed in parallel. So, if you're just worried about
one file a time, yeah. If you're worried about groups of files, different
story.

> That would be nice.  I have wondered why Stream is not a first class
> citizen in the sql world.  I would think that abstractions over
> nvarbinary(max) would be the thing to do and leverage the type.

There was a filestream data type for a while. I don't think its dead yet. :)

>> Non-plus. The same problem exists regardless of message size.
> True.  But it is faster to resend 30K message then loose a 1GB message
> and have to resend the whole msg from the beginning.

Agreed, that's probably why the SSB team picked 90kb.

>> a.) Between backups, you've double disk footprint of the bytes: the
>> committed ones are in the TLOG and in the database.
> Does the tlog never get purged once they are committed in the db?

Only after a successful backup for both the DB and the TLOGs.

> Valid point.  And would be a vital system design point.  On the other
> hand, things like incremental/diff backups can be your friend or
> mirrored DB.  I mean there are some solutions in this regard - no?

Mirroring wouldn't help, it would just be putting more bits on the wire.
A lot would depend on how frequenly you do the backups and where you are
archive them to. You still ultimitely have the problem, at least two copies
of files will exist someplace for some time: one in the live, one in backup.

>> c.) What happens if you also use snapshot isolation and you happen to
>> get write queries against the rows that contain the file? Guess what,
>> you've also just created a copy of the file's bytes in tempDB. Twice
>> the footprint again.
> Could partitions help any here?  Not sure.

No, because AFAIK, out-of-page rows still land in their native partition
(that is, I believe all of the parts of a row set have to live in the same
partition.)

> Naturally.  We always have to address this issue.  To not complicate
> things, I am thinking just in terms of this "turn-key" system.
[quoted text clipped - 4 lines]
> and memory and locks, etc.  So what is the delta? Impossible to answer
> without testing or some prototypes.

What's not impossible to know without testing is that its going to take time
to cruft up a working prototype. Have you done that? How long did it take?

>> Ugh. I wish every developer had to spend three years as an SA. The
>> only thing there that's even in remotely challenging is security and
[quoted text clipped - 10 lines]
> hybrid vb and ftp apps like I "think" you may be talking about.  Hence
> my distaste for using ftp like this.

Something doesn't make sense to me. You'd rather support hand-crufted code
in production than use something that been around for almost two decades?
Humm... while I think it be fun to work with you otherwise, I wouldn't want
to be your admin :)

>> FTP requires one port to be opened but can be changed to anything
>> fairly easily.  Most organizations already have it open if they are
[quoted text clipped - 12 lines]
> people have a access to stuff or not.  If you need to dynamically give
> access based on a purchase, for example, you stuck.

Not really. .NET manages ACLs reasonably well. And you'd even have to go
that far with many FTP systems where A&A and managed through a database back-end.

> Yes we can
> program around this and create wrappers and one time URLs and maybe
> even dynamically change ACLs, but man that is ugly and fragile.

That's not been my experience, but okay.

> Plain
> ftp is a non-starter because the security issues. SFTP could work, but
[quoted text clipped - 4 lines]
> another out-of-band channel to deal with and managing security on
> another service.

Maybe I've just worked with different SFTP products than you have. This seems
like making mountains out of mole hills (SFTP) and vice-verse (SSB).

> The two -vs- one port thing is not a huge deal.  However, if I don't
> need 2 ports, it is just cleaner.  It can also be easier to sell to
> firewall folks, then multiple ports.

Oh I strongly disagree. They already know how to manage FTP traffic and have
tools for dealing with it. Find me one that would rather deal with an unknown
than a known...

> They kinda do.  So I buy File1 all inside a transaction.  My account
> is debited only if I get the whole file.  Any errors will roll-back.

Uh, you realize that very few content providers actually follow that model.
You aren't buying the file, you're buy a license to use it. Its up to you
to go get it. Audible uses that model. You're charged as soon as have the
right get the file, not as soon as you have it.

> Nobody, except a verified purchaser can have access to File1, nor
> should they be able to list or view other files.  One password does
> not fit all here.

Lots of people have access to the file, but not all of them have the license
to use and the access to download it. I don't think you'd ever be able to
scale up and out any other kind of system even with service broker simply
because there wouldn't be enough disk space to keep each copy or each file
around to transact it. YMMV.

>> One login with a fixed password, you're done.
> This is security via obscurity.  No password should be imbedded in
> your code, nor should all users use the same id and password.  

It doesn't have to be, nor is that even what I'm talking about. Duh - remember,
you're still in a pull model: you can use whatever A&A you want there. You
can use that to negiotate an FTP password if you really need it on a onetime,
disposable basis.

> Good point.  I totally agree.  But that is kinda my point as well.  It
> is very simple for me to reason about this data in the stream after I
[quoted text clipped - 3 lines]
> as handling a stream of important messages to the application, not
> just as file transfer per se.

That's Indigo then.

> Each msg has a byte stream.  I can compress each msg independently of
> any other and still decompress it on the other end.

You wouldn't compress the message itself as it contains the routing information.
The payload, yes.

> Thanks Kent.

No, thank you. It's been an interesting discussion.

Thank you,
Kent Tegels
DevelopMentor
http://staff.develop.com/ktegels/
William Stacey [MVP] - 29 Nov 2005 22:34 GMT
> If you have 41k of data to send, you have 41k to send. Some chunk is going
> to be odd sized anyway no matter how nicely you try to normalize the
> stream.

Right, if I am only sending 41KB of data it does not matter much.  Worse
case is two messages.  But if I set my stream chunk size to 41KB and need to
send 100 messages (to represent the whole stream), that turns into 100 40KB
messages and 100 1KB messages - double the number of messages needlessly.
That is why you need to know the chunk size so you can send <= 40KB per
message (or some multiple of the message chunk size).  The last message in
the stream will always be <= buffer size.

Signature

William Stacey [MVP]

Kent Tegels - 29 Nov 2005 23:13 GMT
Hello William Stacey [MVP],

> Right, if I am only sending 41KB of data it does not matter much.
> Worse case is two messages.  But if I set my stream chunk size to 41KB
> and need to send 100 messages (to represent the whole stream), that
> turns into 100 40KB messages and 100 1KB messages - double the number
> of messages needlessly.

I guess I just don't get it. Why is sending 200 messages so much worse than
sending 100? The only difference net difference is the sending the 2xsum(metadata).
Until Roger tells us how many bytes that is, it could be significant, it
could be essentially meaningless. But you don't spend any less time writing
the code and that, frankly is the really expense part. And you've still got,
potentiall, more of the TLOG and backup consumed that is really needed.

But then I wouldn't say spend any cycles or time writing isn't a waste anyway.
Somebody else already has and it is called FTP. It works. If anything, send
1 message, "Go get this file from here and track it." and 2 or 3 messages
as max responding: I'm still getting it, I've got, or I failed to get it.
You have exactly the same possible outcomes even with service broker, it
just means that you it take longer or not to get the "i didn't get it" message
that you want.

> That is why you need to know the chunk size so
> you can send <= 40KB per message (or some multiple of the message
> chunk size).  The last message in the stream will always be <= buffer
> size.

Well, Roger's email made it pretty clear that its a moving target. I wouldn't
be suprised to see it change in a service pack and since its not documented,
MS isn't under any obligation to tell us if they do change it. What happens
if they make it "auto-tunning" (dynamic) such that the chunk size changed
dynamically to take advantage/adapt to detected network condition? What is
that change happens in the middle of sending your chunked data?

Thank you,
Kent Tegels
DevelopMentor
http://staff.develop.com/ktegels/
William Stacey [MVP] - 30 Nov 2005 03:53 GMT
> MS isn't under any obligation to tell us if they do change it.

Agree. Not saying they should or have to.  I was just saying it can matter.

> if they make it "auto-tunning" (dynamic) such that the chunk size changed
> dynamically to take advantage/adapt to detected network condition? What is
> that change happens in the middle of sending your chunked data?

Now your talking.  Me thinks they did a bunch of work in this regard in the
new network stack.

Signature

William Stacey [MVP]

Roger Wolter[MSFT] - 29 Nov 2005 22:28 GMT
OK, but the max chunk size is something we play we to improve efficiency so
you can't rely on it being a particular value.  Just pick a value that works
for you and go with it.  Dialogs should insure that your data gets there in
the right order and that the stream is reliable.

By the way, the 40KB size was back in Beta when the book was written, in RTM
it's around 90KB so you can see it doesn't make a lot of sense to base your
streaming architecture on our internal fragment size.

Signature

This posting is provided "AS IS" with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

> You want to load 2GB arrays in memory on clients and also load 2GB array
> on server?  This may work for 1 user, but what if many hundreds of users
[quoted text clipped - 22 lines]
>>>>> mean exactly 40960 bytes?  So if I send a byte[] of 40960 bytes, it
>>>>> will be sent in exactly 1 message?  TIA
William Stacey [MVP] - 29 Nov 2005 23:02 GMT
Thanks Roger!  Totally get your point why not to assume/expect things like
buffer size.  I was just making the point that is "can" matter in terms of
protocol efficency depending on what your doing and if you have an eye
toward perf.  But I get your point and thank you much the info and interest.
Cheers!

Signature

William Stacey [MVP]

> OK, but the max chunk size is something we play we to improve efficiency
> so you can't rely on it being a particular value.  Just pick a value that
[quoted text clipped - 31 lines]
>>>>>> that mean exactly 40960 bytes?  So if I send a byte[] of 40960 bytes,
>>>>>> it will be sent in exactly 1 message?  TIA
Cowboy (Gregory A. Beamer) - 30 Nov 2005 18:43 GMT
I can see the point, but asynchronous messaging, the point of service
broker, and file upload are two different things. You can create a system to
upload files without service broker (and probably should). I might be
missing the point, however, so correct me if I am not seeing the big
picture.

While not a perfect analogy, Service Broker is Message Queue. You are
holding information on one end until it can communicate to the other and
then relaying the messages. The chunk size deals with queue to queue
communication (actually service to service, but why split hairs ;-> - the
services manage queues through a particular route ... blah, blah, blah).

The idea is you store it on one side to get to the other. Possible with
files? Sure, but I can think of other ways to replicate BLOBs that does not
require an asynchronous queue.

Signature

Gregory A. Beamer
MVP; MCP: +I, SE, SD, DBA

***********************************************
Think Outside the Box!
***********************************************

> You want to load 2GB arrays in memory on clients and also load 2GB array
> on server?  This may work for 1 user, but what if many hundreds of users
[quoted text clipped - 22 lines]
>>>>> mean exactly 40960 bytes?  So if I send a byte[] of 40960 bytes, it
>>>>> will be sent in exactly 1 message?  TIA
Niels Berglund - 28 Nov 2005 08:03 GMT
> Roger said messages over 40KB are divided into 40KB blocks.  Does that
> mean exactly 40960 bytes?  So if I send a byte[] of 40960 bytes, it
> will be sent in exactly 1 message?  TIA

I don't think that 40k is an exact value - if I remember correctly it is at
around 40K, dependent on various "stuff".

Niels

Signature

**************************************************
* Niels Berglund
* http://staff.develop.com/nielsb
* nielsb@no-spam.develop.com
* "A First Look at SQL Server 2005 for Developers"
* http://www.awprofessional.com/title/0321180593
**************************************************

Remus Rusanu [MSFT] - 29 Nov 2005 23:54 GMT
Just as an information note, in the RTM code it ended up at ~90K/fragment.
Each fragment has a fixed overhead of ~200 bytes plus variable overhead
depending on the length of from/to service names, contract name and message
type name. In practice you shouldn't care and shouldn't relly on it, as we
might change it as seen fit. If you have scenarios in which controling the
message body size makes a proovable significant difference, we'd like to
hear about it.

Thanks,
~ Remus

> Roger said messages over 40KB are divided into 40KB blocks.  Does that
> mean exactly 40960 bytes?  So if I send a byte[] of 40960 bytes, it will
> be sent in exactly 1 message?  TIA
Cowboy (Gregory A. Beamer) - 30 Nov 2005 18:39 GMT
As long as SQL Server handles the blocks and has error checking to
reassemble them correctly (from log to queue, which is a table), I am not
sure what the issue is, unless you are trying to figure out data types to
ensure queued messages fit underneath this limit. I would rather figure out
what types of messages are necessary myself.

Signature

Gregory A. Beamer
MVP; MCP: +I, SE, SD, DBA

***********************************************
Think Outside the Box!
***********************************************

> Roger said messages over 40KB are divided into 40KB blocks.  Does that
> mean exactly 40960 bytes?  So if I send a byte[] of 40960 bytes, it will
> be sent in exactly 1 message?  TIA
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.