I have a field in SQL that contains email messages. I would like to extract
the FROM, SUBJ, and TO header data from the field. Here is an example:
Date: Thu, 23 Mar 2000 10:01:33 +0000^^From: John Doe abc@abc.com^^Subject:
Fun Time^^To: Jane Doe ldef@def.com^^
The each carat character represents a square cariage return character
char(13).
I know I can use the following quesy to extract data between two char(13)
characters:
select
SUBSTRING([rfc822],CHARINDEX('char(13)',rfc822),CHARINDEX('char(13)',[rfc822])) as col01
from mailbox
But since there are two char(13)'s in a row, I get no data. How can I create
this query so that I get the data between the double char(13)s.
Something like this:
select
SUBSTRING([rfc822],CHARINDEX('char(13)','char(13)',rfc822),CHARINDEX('char(13)'',char(13)',[rfc822])) as col01
from mailbox
Thanks in advance.
Allan
Plamen Ratchev - 14 Jun 2008 16:03 GMT
This type of problem is best suited for environment that supports regular
expressions. If using SQL Server 2005 take a look at CLR and how you can use
regular expressions:
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
http://blogs.msdn.com/sqlclr/archive/2005/06/29/regex.aspx
Here is one way to parse the string in T-SQL. In the code below replace the
reference to the system table master..spt_values with real auxiliary table
with numbers (http://www.projectdmx.com/tsql/tblnumbers.aspx).
DECLARE @string VARCHAR(500);
SET @string =
'Date: Thu, 23 Mar 2000 10:01:33 +0000' + CHAR(13) + CHAR(13) +
'From: John Doe abc@abc.com' + CHAR(13) + CHAR(13) +
'Subject: Fun Time' + CHAR(13) + CHAR(13) +
'To: Jane Doe ldef@def.com' + CHAR(13) + CHAR(13);
SELECT MAX(CASE WHEN line_idx = 2 THEN line END) AS email_from,
MAX(CASE WHEN line_idx = 3 THEN line END) AS email_subject,
MAX(CASE WHEN line_idx = 4 THEN line END) AS email_to
FROM (
SELECT SUBSTRING(@string, n, CHARINDEX(CHAR(13) + CHAR(13), @string +
CHAR(13) + CHAR(13), n) - n ) AS line,
n + 1 - LEN(REPLACE(LEFT(@string, n), CHAR(13) + CHAR(13), ' '))
AS line_idx
FROM (SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number BETWEEN 1 AND 500) AS Nums(n)
WHERE SUBSTRING(CHAR(13) + CHAR(13) + @string, n, 2) = CHAR(13) + CHAR(13)
AND n < LEN(@string) + 1) AS T;
Results:
email_from email_subject email_to
---------------- -------------------- --------------------
From: John Doe abc@abc.com Subject: Fun Time To: Jane Doe ldef@def.com
HTH,
Plamen Ratchev
http://www.SQLStudio.com
Allan - 15 Jun 2008 00:54 GMT
Plamen,
Thank you for your reply. I did like the article regarding a numbers table,
that can be usefyl. Unfortunately most of the answer you provided is beyond
my humble beginer's skills. I'm not even sure where to run a T-SQL quesry
because you can't run it from the SQL Table Query window.
I did not realize how complex an issue this was. I was hoping to be able to
create a query I could run on and ASP.net ASPX page in a datagrid or in SQL
Server and create a View based on the Query.
Thank you for your help though.
Any thoughts on how I can do this in ASP.net or a SQL View?
Allan
_____________________________________________________________
> This type of problem is best suited for environment that supports regular
> expressions. If using SQL Server 2005 take a look at CLR and how you can use
[quoted text clipped - 39 lines]
> Plamen Ratchev
> http://www.SQLStudio.com
Plamen Ratchev - 15 Jun 2008 03:41 GMT
You run queries from a new Query window in SQL Server 2005, and Query
Analyzer in SQL Server 2000. But since you do not feel comfortable with
T-SQL you should look to solve this problem in .NET. The RegEx class in .NET
is very powerful tool and you can write regular expressions to extract the
different segments from the text. That will be easier and more efficient.
Here are a couple articles on using regular expressions in .NET:
http://msdn.microsoft.com/en-us/library/ms972966.aspx
http://aspnet.4guysfromrolla.com/articles/022603-1.aspx
Here is how that query can be used in a view (I used the table name and
column you provided in the initial post):
CREATE VIEW EmailSplit
AS
SELECT MAX(CASE WHEN line_idx = 2 THEN line END) AS email_from,
MAX(CASE WHEN line_idx = 3 THEN line END) AS email_subject,
MAX(CASE WHEN line_idx = 4 THEN line END) AS email_to
FROM (
SELECT SUBSTRING(rfc822, n, CHARINDEX(CHAR(13) + CHAR(13), rfc822 + CHAR(13)
+ CHAR(13), n) - n ) AS line,
n + 1 - LEN(REPLACE(LEFT(rfc822, n), CHAR(13) + CHAR(13), ' ')) AS
line_idx
FROM (SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number BETWEEN 1 AND 500) AS Nums(n)
CROSS JOIN Mailbox
WHERE SUBSTRING(CHAR(13) + CHAR(13) + rfc822, n, 2) = CHAR(13) + CHAR(13)
AND n < LEN(rfc822) + 1) AS T;
HTH,
Plamen Ratchev
http://www.SQLStudio.com
Ratandeep Gupta - 01 Jul 2008 19:50 GMT
I have a file in database that I would use to extract some data from. The file is having data like:
GRAND SUMMARY: PART 1
ACC-1-ACC-OP - 2|ACC-8-ACC-OP- 1|AHA-1-AHA-OP- 90|
AJS-22-QWS-OP- 6|a.s-20-QHQ-90- 2| AGA-22-AHA-OP-90|
.
.
.
AJS-22-QWS-OP- 6|a.s-20-QHQ-90- 2| AGA-22-AHA-OP-90|
AJS-22-QWS-OP- 6|a.s-20-QHQ-90- 2|
I want all the fields to be separately extracted as like:
1. ACC-1-ACC-OP - 2
2. ACC-8-ACC-OP- 1
3. AHA-1-AHA-OP- 90
4. AJS-22-QWS-OP- 6
5. a.s-20-QHQ-90- 2
6.
7.
An entry is completed when there is a "|" character. I am new to sql so if you can explain the code as well in little detail, I would really appreciate it.
thanks!