<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Steve Kass &#187; SQL Server</title>
	<atom:link href="http://stevekass.com/category/sql-server/feed/" rel="self" type="application/rss+xml" />
	<link>http://stevekass.com</link>
	<description>this is my glass container</description>
	<lastBuildDate>Sat, 24 Jul 2010 16:20:24 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Localization (probably) strikes again</title>
		<link>http://stevekass.com/2009/11/26/localization-probably-strikes-again/</link>
		<comments>http://stevekass.com/2009/11/26/localization-probably-strikes-again/#comments</comments>
		<pubDate>Fri, 27 Nov 2009 02:17:18 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://stevekass.com/2009/11/26/localization-probably-strikes-again/</guid>
		<description><![CDATA[Yesterday, the Italian postal service misprocessed a bunch of ATM and credit card transactions. Specifically, the virgola was shifted two places, appending two zeros to the transaction amount. There’s no telling exactly how this happened, but it wouldn’t surprise me if it had something—if not everything—to do with localization in one way or another. In [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday, the Italian postal service misprocessed a bunch of ATM and credit card transactions. Specifically, the <em>virgola</em> was shifted two places, appending two zeros to the transaction amount. There’s no telling exactly how this happened, but it wouldn’t surprise me if it had something—if not everything—to do with <a href="http://en.wikipedia.org/wiki/Localization" target="_blank">localization</a> in one way or another. In Italy, a comma (<em>virgola</em>), not a period, precedes a number’s decimal part, but software might see things otherwise.</p>
<p>Some software interprets number strings according to the operating system localization (unless overridden). Other software ignores the OS localization. SQL Server’s CAST operator, for example, only accepts a period as the decimal separator, and it disregards commas in strings intended to represent numbers.</p>
<p>At least it does this <a href="http://groups.google.com/group/microsoft.public.es.sqlserver/browse_thread/thread/602e49958909fb9b/ad6e74c54ae5abc?hl=en&amp;ie=UTF-8&amp;q=kass+isnumeric+comma+decimal#0ad6e74c54ae5abc" target="_blank">as of 2005</a>; previous versions followed a complicated set of rules in an attempt to disallow numbers that weren’t valid in the U.S., India, or China. In India (ones, thousands, lakhs, crore, thousand crore, lakhs crore, etc.), digit groups bounce between two and three digits, and 1,234,56,70,000.0 is a valid number. In China (yi1, wan4, yi4, wan4 yi4, etc.), it would be 123,4567,0000.0. Interpreting human-readable representations of numbers is no simple task. Explaining the issue isn’t much easier. </p>
<p>In all versions of SQL Server, this happens regardless of language or culture settings.</p>
<pre><code>select cast('115,00' as money) as TooMuch;

TooMuch
---------------------
11500.00</code></pre>
<p>[From <a href="http://entertainment.slashdot.org/story/09/11/25/1448218/Moving-Decimal-Bug-Loses-Money" target="_blank">Slashdot</a>, noting <a href="http://www.ilsole24ore.com/art/SoleOnLine4/Italia/2009/11/poste-italiane-disguido-addebiti-gonfiati.shtml" target="_blank">ilsole24ore.com</a>] </p>
]]></content:encoded>
			<wfw:commentRss>http://stevekass.com/2009/11/26/localization-probably-strikes-again/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>9/11 pager intercepts on Wikileaks</title>
		<link>http://stevekass.com/2009/11/26/911-pager-intercepts-on-wikileaks/</link>
		<comments>http://stevekass.com/2009/11/26/911-pager-intercepts-on-wikileaks/#comments</comments>
		<pubDate>Thu, 26 Nov 2009 04:56:13 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[Black Tuesday]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Teaching]]></category>

		<guid isPermaLink="false">http://stevekass.com/2009/11/26/911-pager-intercepts-on-wikileaks/</guid>
		<description><![CDATA[Early this morning, Wikileaks began posting alphanumeric pager messages from four carriers (Arch, Metrocall, Skytel, and Weblink_B) that were intercepted during a 24-hour period beginning early on September 11, 2001. Alphanumeric pager messages are unencrypted, and, like communications over a public 802.11 wireless network, they’re skimmable with the right (and not exotic) software and hardware.

“Due [...]]]></description>
			<content:encoded><![CDATA[<p>Early this morning, <a href="http://911.wikileaks.org/files/index.html" target="_blank">Wikileaks began posting</a> alphanumeric pager messages from four carriers (Arch, Metrocall, Skytel, and Weblink_B) that were intercepted during a 24-hour period beginning early on September 11, 2001. Alphanumeric pager messages are unencrypted, and, like communications over a public 802.11 wireless network, they’re skimmable with the right (and not exotic) software and hardware.</p>
<ul>
<li>“Due to today&#8217;s tragic events, it makes sense to cut back wherever feasible on payroll. Expect a very light business day. Please call all stores and review payroll issues”</li>
<li>“RING ALL CHICAGO AIPORTS AND EVERY MAJOR BUILDING DOWNTOWN. BUSH IS DOING A SPEECH.&#160; THIS IS SERIOUS POOH..”</li>
<li>“Holy crap, are you watching the news.”</li>
<li>“I hope you have gone home by now. The BoA tower and space needle here are closed. I suspect tall buildings across the country will be closed. Take care my love.-cb”</li>
</ul>
<p>This might be the most interesting public data mine since <a href="http://stevekass.com/category/black-tuesday/" target="_blank">the AOL breach</a>. The total volume is far less, but unlike the AOL data, this data hasn’t been anonymized. There are full names, phone numbers, and other identifying information in the mix. </p>
]]></content:encoded>
			<wfw:commentRss>http://stevekass.com/2009/11/26/911-pager-intercepts-on-wikileaks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Buy my book (from Barnes and Noble)</title>
		<link>http://stevekass.com/2009/04/13/buy-my-book-from-barnes-and-noble/</link>
		<comments>http://stevekass.com/2009/04/13/buy-my-book-from-barnes-and-noble/#comments</comments>
		<pubDate>Mon, 13 Apr 2009 22:52:13 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://stevekass.com/2009/04/13/buy-my-book-from-barnes-and-noble/</guid>
		<description><![CDATA[If you squint, you’ll see my name in tiny print under Itzik’s. He wrote most of the book, but I contributed two chapters and did most of the technical review. Click on the image to visit the book&#8217;s Barnes and Noble page.
.
.
.
.
.
]]></description>
			<content:encoded><![CDATA[<p><a title="Inside Microsoft SQL Server 2008: T-SQL Querying" href="http://search.barnesandnoble.com/booksearch/isbnInquiry.asp?isbn=9780735626034"><img style="margin: 0px 5px 5px 0px; display: inline" src="http://images.barnesandnoble.com/images/34720000/34725169.JPG" alt="" align="left" /></a>If you squint, you’ll see my name in tiny print under Itzik’s. He wrote most of the book, but I contributed two chapters and did most of the technical review. Click on the image to visit the book&#8217;s Barnes and Noble page.</p>
<p>.</p>
<p>.</p>
<p>.</p>
<p>.</p>
<p>.</p>
]]></content:encoded>
			<wfw:commentRss>http://stevekass.com/2009/04/13/buy-my-book-from-barnes-and-noble/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Read this if you serve up web pages from SQL data</title>
		<link>http://stevekass.com/2008/05/31/read-this-if-you-serve-up-web-pages-from-sql-data/</link>
		<comments>http://stevekass.com/2008/05/31/read-this-if-you-serve-up-web-pages-from-sql-data/#comments</comments>
		<pubDate>Sat, 31 May 2008 00:07:27 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://stevekass.com/2008/05/31/read-this-if-you-serve-up-web-pages-from-sql-data/</guid>
		<description><![CDATA[If you manage, write, visit, or otherwise have anything to do with a web app that connects to a SQL Server database, good guy and Microsoft Program Manager Buck Woody wants you to read this:
[copied with permission from here]
You might have read recently that there have been ongoing SQL injection attacks against vulnerable web applications [...]]]></description>
			<content:encoded><![CDATA[<p>If you manage, write, visit, or otherwise have anything to do with a web app that connects to a SQL Server database, good guy and Microsoft Program Manager Buck Woody wants you to read this:</p>
<p>[copied with permission from <a title="http://blogs.msdn.com/buckwoody/archive/2008/05/30/sql-injection-attacks.aspx" href="http://blogs.msdn.com/buckwoody/archive/2008/05/30/sql-injection-attacks.aspx">here</a>]</p>
<blockquote><p>You might have read recently that there have been ongoing SQL injection attacks against vulnerable web applications occurring over the last few months.&nbsp; These attacks have received recurring attention in the press as they pop up in various geographies around the world. These attacks do not leverage any SQL Server vulnerabilities or any un-patched vulnerabilities in any Microsoft product – the attack vector is vulnerable custom applications. In fact, SQL Injection is a coding issue that can attack any database system, so it&#8217;s a good idea to learn how to defend against them.
<p>In order to help you respond to and defend yourself from these attacks, Microsoft has an authoritative blog including talking points and guidance.&nbsp; You can find this at <a title="http://blogs.technet.com/swi/archive/2008/05/29/sql-injection-attack.aspx" href="http://blogs.technet.com/swi/archive/2008/05/29/sql-injection-attack.aspx">this Technet location</a>. (Retype the underlying URL if you like. I only linked it this way because it wrapped.)</p>
</blockquote>
<p>Ok, if you didn&#8217;t visit <a title="http://blogs.technet.com/swi/archive/2008/05/29/sql-injection-attack.aspx" href="http://blogs.technet.com/swi/archive/2008/05/29/sql-injection-attack.aspx">the Technet link</a>, visit it before reading on.
<p>Thanks. Now I&#8217;ll add another bit of advice:
<p>There&#8217;s a non-SQL injection issue here as well. The risk in question starts when a web application incorporates part of the URL into SQL and executes it blindly (SQL injection), but the risk to end users only occurs because the web app commits &#8220;HTML<br />injection.&#8221; The web app unwittingly delivers a malicious bit of HTML that says &#8220;Hey browser, please run a script from this other web site.&#8221; That malicious bit of HTML won&#8217;t be sent to my browser if the web application doesn&#8217;t blindly incorporate table data (especially table data containing HTML tags) into the HTML pages it delivers.
<p>Here&#8217;s an analogy. When you fill a prescription, you get instructions like &#8220;Take one pill twice a day for seven days.&#8221; Those instructions probably get printed out of some database. If the instructions say &#8220;Chew up all the pills and wash them down with a cup of bleach,&#8221; something&#8217;s wrong with the pharmacy&#8217;s database. Something&#8217;s also wrong with the pharmacy for not catching the bogus instructions before dispensing the prescription. And if you follow the instructions, something&#8217;s wrong with you.</p>
<p>The risk Buck is drawing our attention to is like this, and the Technet blog tells us to secure our database. Just as importantly, we should pay attention to what we dispense, and not just assume that if we&#8217;re dispensing our data, it&#8217;s good data. Browsers often render (and in the case of scripts, execute) whatever a trusted site sends them, and if trusted sites send HTML out without vetting it, well, they shouldn&#8217;t be trusted. If you&#8217;re a web developer and you want your site to be trusted, then vet what you deliver.</p>
<p>I don&#8217;t do web apps, but I don&#8217;t think a responsible web app should send me script tags that refer to third-party sites. In fact, the web app probably shouldn&#8217;t send me any table data without scrubbing it for tags, non-printing ASCII characters, etc.
<p>Many years ago, we thought it was funny to email people BEL characters, and then someone figured out email shouldn&#8217;t be allowed to contain BEL. Years ago bulletin boards figured out they shouldn&#8217;t allow users to put any old HTML into their posts.<br />The threat then was still minor &#8211; jokers figured out they could mess up some bulletin board formatting by posting opening tags without closing them. Apparently this was only half fixed. Web apps typically scrub what comes in through the expected channels, but a lot of web apps (most?) apparently don&#8217;t scrub the HTML they send out. They should. In fact, they must, now that the bad guys have figured out how to exploit sloppy web apps to modify table data bypassing the expected route. The bad guys may soon find some more sloppy code and exploit it to mess with your data.</p>
<p>Just as it&#8217;s possible to scrub outgoing email for viruses, it should be possible (and routine) to scrub outgoing HTML for malicious content. While I don&#8217;t trust email attachments that have a &#8220;no viruses&#8221; sticker on them, and I wouldn&#8217;t trust a random site that tells me &#8220;this web page is safe,&#8221; I would trust Microsoft or another trustworthy source if they told me their web servers scrub all outgoing web pages for unexpected script tags. </p>
]]></content:encoded>
			<wfw:commentRss>http://stevekass.com/2008/05/31/read-this-if-you-serve-up-web-pages-from-sql-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Spearman&#8217;s rho for SQL Server</title>
		<link>http://stevekass.com/2008/03/29/spearmans-rho-for-sql-server/</link>
		<comments>http://stevekass.com/2008/03/29/spearmans-rho-for-sql-server/#comments</comments>
		<pubDate>Sat, 29 Mar 2008 06:33:51 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://stevekass.com/2008/03/29/spearmans-rho-for-sql-server/</guid>
		<description><![CDATA[Before SQL Server 2005 was released, a calculation that requiring a ranking was both relatively difficult to express as a single query and relatively inefficient to execute. That changed in SQL Server 2005 with support for the SQL analytic functions RANK(), ROW_NUMBER(), etc., and partial support for SQL&#8217;s OVER clause.
Spearman&#8217;s rho (Spearman&#8217;s correlation coefficient) is [...]]]></description>
			<content:encoded><![CDATA[<p>Before SQL Server 2005 was released, a calculation that requiring a ranking was both relatively difficult to express as a single query and relatively inefficient to execute. That changed in SQL Server 2005 with support for the SQL analytic functions RANK(), ROW_NUMBER(), etc., and partial support for SQL&#8217;s OVER clause.</p>
<p>Spearman&#8217;s rho (Spearman&#8217;s correlation coefficient) is a useful statistic that can be calculated more easily in SQL Server 2005 than in earlier versions. Below is an implementation of Spearman&#8217;s rho for SQL Server 2005 and later.</p>
<p>SQL&#8217;s RANK() and the rank order required for the calculation of Spearman&#8217;s rho are slightly different: if for example four values are tied for third place, RANK() will equal 3 for all four of them. The Spearman&#8217;s formula requires them all to be ranked 4.5, the average of their positions (3rd, 4th, 5th, and 6th) in an ordered list of the data. To address this difference, the code below adjusts the SQL RANK() by adding to it 0.5 for each occurrence of a data value beyond the first. I used COUNT(*) with an OVER clause for this.</p>
<p>The script below demonstrates the calculation for two data sets. The first one is from <a href="http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient" title="Accessed March 28, 2008" target="_blank">Wikipedia&#8217;s page on Spearman&#8217;s rho</a>; I made up the second data set to include duplicate data values. I haven&#8217;t tested the code thoroughly, but for a variety of small test data sets, it matches hand calculations and the result <a href="http://www.wessa.net/rankcorr.wasp" title="Accessed March 28, 2008" target="_blank">here</a> [1].</p>
<p><font face="Courier New">create table SampleData (<br />
ID int identity(1,1) primary key,<br />
x decimal(5,2),<br />
y decimal(5,2)<br />
); </font></p>
<p><font face="Courier New">insert into SampleData(x,y) values(106,7);<br />
insert into SampleData(x,y) values(86,0);<br />
insert into SampleData(x,y) values(100,27);<br />
insert into SampleData(x,y) values(101,50);<br />
insert into SampleData(x,y) values(99,28);<br />
insert into SampleData(x,y) values(103,29);<br />
insert into SampleData(x,y) values(97,20);<br />
insert into SampleData(x,y) values(113,12);<br />
insert into SampleData(x,y) values(112,6);<br />
insert into SampleData(x,y) values(110,17);<br />
go </font></p>
<p><font face="Courier New">create procedure Spearman as<br />
with RankedSampleData(ID,x,y,rk_x,rk_y) as (<br />
select<br />
ID,<br />
x,<br />
y,<br />
rank() over (order by x) +<br />
(count(*) over (partition by x) &#8211; 1)/2.0,<br />
rank() over (order by y) +<br />
(count(*) over (partition by y) &#8211; 1)/2.0<br />
from SampleData<br />
)<br />
select<br />
1e0 -<br />
(<br />
6<br />
*sum(square(rk_x-rk_y))<br />
/count(*)<br />
/(square(count(*)) &#8211; 1)<br />
)<br />
from RankedSampleData;<br />
go </font></p>
<p><font face="Courier New">exec Spearman; </font></p>
<p><font face="Courier New">go<br />
truncate table SampleData;<br />
go </font></p>
<p><font face="Courier New">insert into SampleData(x,y) values(1,3);<br />
insert into SampleData(x,y) values(3,5);<br />
insert into SampleData(x,y) values(5,8);<br />
insert into SampleData(x,y) values(3,4);<br />
insert into SampleData(x,y) values(4,7);<br />
insert into SampleData(x,y) values(4,6);<br />
insert into SampleData(x,y) values(3,4);<br />
go </font></p>
<p><font face="Courier New">exec Spearman;<br />
go </font></p>
<p><font face="Courier New">drop proc Spearman;<br />
drop table SampleData;</font></p>
<p>[1] Wessa, P. (2008), Free Statistics Software, Office for Research Development and Education, version 1.1.22-r4, URL <a href="http://www.wessa.net/">http://www.wessa.net/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://stevekass.com/2008/03/29/spearmans-rho-for-sql-server/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Elapsed time excluding nights and weekends</title>
		<link>http://stevekass.com/2007/12/19/elapsed-time-excluding-nights-and-weekends/</link>
		<comments>http://stevekass.com/2007/12/19/elapsed-time-excluding-nights-and-weekends/#comments</comments>
		<pubDate>Wed, 19 Dec 2007 20:39:54 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://stevekass.com/2007/12/19/elapsed-time-excluding-nights-and-weekends/</guid>
		<description><![CDATA[Finding elapsed time in SQL Server is easy, so long as the clock is always running: just use DATEDIFF. But you often need to find elapsed time excluding certain periods, like weekends, nights, or holidays. A fellow SQL Server MVP recently posed a variation on this problem: to find the number of minutes between two [...]]]></description>
			<content:encoded><![CDATA[<p>Finding elapsed time in SQL Server is easy, so long as the clock is always running: just use DATEDIFF. But you often need to find elapsed time excluding certain periods, like weekends, nights, or holidays. A fellow SQL Server MVP recently posed a variation on this problem: to find the number of minutes between two times, where the clock is running only from 6:00am-6:00pm, Monday-Friday. He needed this to compute how long trouble tickets stayed at a help desk that was open for those hours.</p>
<p>I came up with a function DeskTimeDiff_minutes(@from,@to) for him. It requires a permanent table that spans the range of times you might care about, holding one row for every time the clock is turned on or off, weekdays at 6:00am and 6:00pm in this case.</p>
<p>The table also holds an &#8220;absolute business time&#8221; in minutes (ABT-m): the total number of &#8220;help desk open&#8221; minutes since a fixed but arbitrary &#8220;beginning of time.&#8221; Elapsed help desk time is then simply the difference between ABT-m values. While the table only records the ABT-m 10 times a week, you can find the ABT-m for an arbitrary datetime @d easily. Find the row of the table with time d closest to @d but not later. In that row you&#8217;ll find the ABT-m at time d, and you&#8217;ll also find out whether the clock was (or will be) running or not between d and @d. If not, the ABT-m at time @d is the same as at time d. Otherwise, add the number of minutes between d and @d.</p>
<p>Here&#8217;s the code. The reference table here is good from early 2000 until well past 2050, and you can easily extend it or adapt it to other business rules. A larger permanent table of times shouldn&#8217;t affect performance, because the function only performs (two) index seek lookups on the table.</p>
<p>If you cut and paste this for your own use, watch out for &#8220;smart quotes&#8221; or other Wordpress/Live Writer formatting quirks.</p>
<p><font face="Lucida Console">create table Minute_Count(<br />&nbsp; d datetime primary key,<br />&nbsp; elapsed_minutes int not null,<br />&nbsp; timer varchar(10) not null check (timer in (&#8216;Running&#8217;,'Stopped&#8217;))<br />); </font>
<p><font face="Lucida Console">insert into Minute_Count values (&#8216;2000-01-03T06:00:00&#8242;,0,&#8217;Running&#8217;);<br />insert into Minute_Count values (&#8216;2000-01-03T18:00:00&#8242;,12*60,&#8217;Stopped&#8217;); </font>
<p><font face="Lucida Console">insert into Minute_Count values (&#8216;2000-01-04T06:00:00&#8242;,12*60,&#8217;Running&#8217;);<br />insert into Minute_Count values (&#8216;2000-01-04T18:00:00&#8242;,24*60,&#8217;Stopped&#8217;); </font>
<p><font face="Lucida Console">insert into Minute_Count values (&#8216;2000-01-05T06:00:00&#8242;,24*60,&#8217;Running&#8217;);<br />insert into Minute_Count values (&#8216;2000-01-05T18:00:00&#8242;,36*60,&#8217;Stopped&#8217;); </font>
<p><font face="Lucida Console">insert into Minute_Count values (&#8216;2000-01-06T06:00:00&#8242;,36*60,&#8217;Running&#8217;);<br />insert into Minute_Count values (&#8216;2000-01-06T18:00:00&#8242;,48*60,&#8217;Stopped&#8217;); </font>
<p><font face="Lucida Console">insert into Minute_Count values (&#8216;2000-01-07T06:00:00&#8242;,48*60,&#8217;Running&#8217;);<br />insert into Minute_Count values (&#8216;2000-01-07T18:00:00&#8242;,60*60,&#8217;Stopped&#8217;);<br />/* any Monday-Friday week */</font>
<p><font face="Lucida Console">declare @week int;<br />set @week = 1;<br />while @week &lt; 2100 begin<br />&nbsp; insert into Minute_Count<br />&nbsp;&nbsp;&nbsp; select<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; dateadd(week,@week,d),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; elapsed_minutes + 60*@week*60,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; timer<br />&nbsp; from Minute_Count<br />&nbsp; set @week = @week * 2<br />end; </font>
<p><font face="Lucida Console">go </font>
<p><font face="Lucida Console">create function DeskTimeDiff_minutes(<br />&nbsp; @from datetime,<br />&nbsp; @to datetime<br />) returns int as begin<br />&nbsp; declare @fromSerial int;<br />&nbsp; declare @toSerial int;<br />&nbsp; with S(d,elapsed_minutes,timer) as (<br />&nbsp;&nbsp;&nbsp; select top 1 d,elapsed_minutes, timer<br />&nbsp;&nbsp;&nbsp; from Minute_Count<br />&nbsp;&nbsp;&nbsp; where d &lt;= @from<br />&nbsp;&nbsp;&nbsp; order by d desc<br />&nbsp; )<br />&nbsp;&nbsp;&nbsp; select @fromSerial =<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; elapsed_minutes +<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; case when timer = &#8216;Running&#8217;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; then datediff(minute,d,@from)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; else 0 end<br />&nbsp;&nbsp;&nbsp; from S;<br />&nbsp; with S(d,elapsed_minutes,timer) as (<br />&nbsp;&nbsp;&nbsp; select top 1 d,elapsed_minutes, timer<br />&nbsp;&nbsp;&nbsp; from Minute_Count<br />&nbsp;&nbsp;&nbsp; where d &lt;= @to<br />&nbsp;&nbsp;&nbsp; order by d desc<br />&nbsp; )<br />&nbsp;&nbsp;&nbsp; select @toSerial =<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; elapsed_minutes +<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; case when timer = &#8216;Running&#8217;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; then datediff(minute,d,@to)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; else 0 end<br />&nbsp;&nbsp;&nbsp; from S;<br />&nbsp; return @toSerial &#8211; @fromSerial;<br />end;<br />go<br />select MAX(d) from Minute_Count<br />select dbo.DeskTimeDiff_minutes(&#8216;2007-12-19T18:00:00&#8242;,&#8217;2007-12-24T17:51:00&#8242;);<br />go </font>
<p><font face="Lucida Console">drop function DeskTimeDiff_minutes;<br />drop table Minute_Count;</font></p>
]]></content:encoded>
			<wfw:commentRss>http://stevekass.com/2007/12/19/elapsed-time-excluding-nights-and-weekends/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The hemisphere requirement</title>
		<link>http://stevekass.com/2007/11/21/the-hemisphere-requirement/</link>
		<comments>http://stevekass.com/2007/11/21/the-hemisphere-requirement/#comments</comments>
		<pubDate>Wed, 21 Nov 2007 22:16:47 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://stevekass.com/2007/11/21/the-hemisphere-requirement/</guid>
		<description><![CDATA[Microsoft plans to support spatial data types in SQL Server 2008, and a preview is available to the community in the latest CTP (community technology preview), available here.
John O&#8217;Brien, a Windows Live Developer MVP,&#160;has been&#160;trying out the new spatial types in some cool Virtual Earth projects&#160;(John&#8217;s site is&#160;here), and&#160;in one of his projects, SQL Server [...]]]></description>
			<content:encoded><![CDATA[<p>Microsoft plans to support spatial data types in SQL Server 2008, and a preview is available to the community in the latest CTP (community technology preview), available <a title="http://www.microsoft.com/sql/2008/default.mspx" href="http://www.microsoft.com/sql/2008/default.mspx">here</a>.
<p>John O&#8217;Brien, a Windows Live Developer MVP,&nbsp;has been&nbsp;trying out the new spatial types in some cool Virtual Earth projects&nbsp;(John&#8217;s site is&nbsp;<a title="http://www.soulsolutions.com.au" href="http://www.soulsolutions.com.au">here</a>), and&nbsp;in one of his projects, SQL Server threw an interesting error message. When he zoomed far enough out in Virtual Earth, then tried to create a polygon from the map bounds, SQL Server reacted with:
<p>“The specified input does not represent a valid geography instance because it exceeds a single hemisphere. Each geography instance must fit inside a single hemisphere. A common reason for this error is that a polygon has the wrong ring orientation.”
<p>John&nbsp;found a workaround, dividing the map into two pieces, but he was interested to know what the SQL Server folk thought about the situation. Here’s my reply. It’s less a response to John’s inquiry than it is a ramble about geometry and what hemispheres and orientation have to do with how you can or can’t specify polygons.
<p>To begin, think of the earth’s Equator as a polygon. How would you answer the following questions?
<ul>
<li>“If I travel Eastbound around the earth along the equator, have I gone clockwise or counter-clockwise?”
<li>“Is the north pole inside the equator or outside the equator?” </li>
</ul>
<p>In the plane (or on a flat map of the world), a polygon or other closed non-self-intersecting curve has a well-defined “inside” and “outside”. A polygon separates the plane into two regions, one that has finite area and one that is unbounded. The finite region is deemed “inside” the polygon. On a sphere, however, a closed curve determines two finite regions, either of which might be what someone thinks of as the inside.
<p>For example, the four-sided outline of the US state of Wyoming separates the earth into what you could call “Wyoming” and “anti-Wyoming.” But are we so sure which is the inside and which is the outside? Our intuition is that the smaller region is always the inside, but there’s nothing about geometry and geography to tell us that. Maybe Wyoming is most of the world. A single geographic region could contain most of the earth’s surface within its borders, couldn’t it?
<p>Suppose Wyoming declared itself to be Great Wyoming and annexed all of North America, Europe, and continued to conquer the world. Suppose its armies crossed the equator and eventually took over almost everything—everything but Antarctica, in fact.
<p>Then the boundary of Great Wyoming would then be the same as the boundary of Antarctica. You would probably want Great Wyoming to be inside the boundary of Great Wyoming and Antarctica to be inside the boundary of Antarctica, but how can that work—the boundaries are the same?
<p>This is a problem. On a sphere, the naïve idea of interior/exterior isn’t well-defined. One solution would be to pass a law that every polygon on earth must fit inside a single hemisphere with room to spare. We could then <i>define</i> the interior of a polygon to be the smaller of the two regions it determines. This would place Antarctica, not Wyoming, within the borders of Great Wyoming—wrong, but unambiguous. And anyway, who would ever need to consider a region <s>bigger than 640K</s> that doesn’t fit inside a single hemisphere?
<p>Fortunately, though, we don’t have to abandon or compromise the notion of interior and exterior on the earth’s surface: Antarctica can remain outside Greater Wyoming. All we need to do is be precise about the direction in which we describe a polygon. When specifying the boundary of a region, you can give a forwards/backwards or clockwise/counter-clockwise sense to the boundary by choosing the way you order the list of vertices. List them so that what you consider inside the region is on your left as you &#8220;connect the dots,&#8221; because we will&nbsp;adopt the convention that the left side as you walk the perimeter is the inside. What’s on the right will be interpreted as outside. Now you can describe the boundary of Great Wyoming. Just describe it as drawn from west to east, so Antarctica is on the right (exterior). (This works because a sphere is an “orientable surface.” SQL Server’s new geography data type isn’t supported on a Klein bottle, where CultureInfo.IsOrientableWorld—if such a property existed—would be false.)
<p>Once we require polygons to be oriented, there’s no need to require that they fit within a single hemisphere, but nonetheless, SQL Server 2008’s geography data type adopts the hemisphere requirement. For geometry objects of type Polygon, I think this is a good idea. I’m not sure whether it’s a standard GIS requirement or just SQL Server’s, but it prevents users from accidentally entering the coordinates of Wyoming in clockwise fashion only to discover later that Perth and Addis Ababa, but not Cheyenne, are in Wyoming. [For some of the other geography types, such as LineString, I don’t see a benefit from requiring the object to fit in a hemisphere, but consistency isn’t a bad thing.]</p>
]]></content:encoded>
			<wfw:commentRss>http://stevekass.com/2007/11/21/the-hemisphere-requirement/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Million Random Digits with 100,000 Normal Deviates</title>
		<link>http://stevekass.com/2006/08/09/a-million-random-digits-with-100000-normal-deviates/</link>
		<comments>http://stevekass.com/2006/08/09/a-million-random-digits-with-100000-normal-deviates/#comments</comments>
		<pubDate>Wed, 09 Aug 2006 04:27:08 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://www.stevekass.com/2006/08/09/a-million-random-digits-with-100000-normal-deviates/</guid>
		<description><![CDATA[Groundbreaking when it was published in 1955, the classic book &#8220;A Million Random Digits with 100,000 Normal Deviates&#8221; has been republished electronically by the RAND corporation with permission &#8220;to duplicate this electronic document for personal use only, as long as it is unaltered and complete.&#8221;  Books like these were a staple of statistical research [...]]]></description>
			<content:encoded><![CDATA[<p>Groundbreaking when it was published in 1955, the classic book &#8220;A Million Random Digits with 100,000 Normal Deviates&#8221; has been republished electronically by the RAND corporation with permission &#8220;to duplicate this electronic document for personal use only, as long as it is unaltered and complete.&#8221;  Books like these were a staple of statistical research in the mid-20th century, and this particular one was highly revered.</p>
<p>Nowadays, there are better sources of random numbers, such as <a href="http://www.fourmilab.ch/hotbits/">HotBits</a>, and there are many ways to generate pseudorandom numbers, which are not random, but have many of the properties of random number and are useful for many purposes.</p>
<p>I hope it&#8217;s not a violation of the copyright for me to provide instructions on how to use SQL to load the book&#8217;s content in its published format (or any identically-formatted list) into a SQL table that can be queried for random (not pseudorandom) sequences of numbers. The script uses a few of SQL Server 2005&#8217;s new features, including the BULK rowset provider for text files, some of the new analytic functions, and TOP with a variable. You&#8217;ll also need a table-valued function called Numbers(), like the one in my previous SQL post.</p>
<p>The RAND book is available <a href="http://www.rand.org/pubs/monograph_reports/MR1418/index.html">here</a>, and my script works for the support file &#8220;Datafile: A Million Random Digits,&#8221; available for download <a href="http://www.rand.org/pubs/monograph_reports/MR1418/index.html">here</a>. The SQL Server 2005 script below assumes you&#8217;ve downloaded this file and unzipped it to C:\\RAND\\MillionDigits.txt.</p>
<p>The beginning of the file looks like this</p>
<p><code>00000   10097 32533  76520 13586  34673 54876  80959 09117  39292 74945<br />
00001   37542 04805  64894 74296  24805 24037  20636 10402  00822 91665<br />
00002   08422 68953  19645 09303  23209 02560  15953 34764  35080 33606<br />
00003   99019 02529  09376 70715  38311 31165  88676 74397  04436 27659<br />
00004   12807 99970  80157 36147  64032 36653  98951 16877  12171 76833<br />
00005   66065 74717  34072 76850  36697 36170  65813 39885  11199 29170<br />
00006   31060 10805  45571 82406  35303 42614  86799 07439  23403 09732<br />
00007   85269 77602  02051 65692  68665 74818  73053 85247  18623 88579<br />
00008   63573 32135  05325 47048  90553 57548  28468 28709  83491 25624<br />
00009   73796 45753  03529 64778  35808 34282  60935 20344  35273 88435</code></p>
<p>Unix-style newlines (<tt>0x0A</tt>) are used, and the million digits are organized into 20,000 five-digit integers with leading zeroes, so the script will import the file into a table of 20,000 five-digit numbers (as char(5) data with leading zeroes). Here&#8217;s the script:  <span id="more-31"></span></p>
<p><tt>create database MillionDigits<br />
go</tt></p>
<p><tt>use MillionDigits<br />
go</tt></p>
<p><tt> </tt></p>
<p><tt>create table MillionDigitsFile (<br />
c varchar(max)<br />
)<br />
go</p>
<p>insert into MillionDigitsFile<br />
select BulkColumn<br />
from openrowset(bulk 'C:\\RAND\\MillionDigits.txt\\', SINGLE_CLOB) as D<br />
go</p>
<p>create table NumbersFromTable(<br />
position int primary key,<br />
number char(5) not null<br />
)<br />
create index NumbersFromTable_number on NumbersFromTable(number)<br />
go</p>
<p>-- The first of the five groups of two numbers each<br />
-- begins at position 9 of each line. Each of the other<br />
-- four groups on a line begins 13 characters after the<br />
-- previous one. The second number in each group<br />
-- begins 6 characters after the first.<br />
insert into NumbersFromTable<br />
select<br />
row_number() over (order by N.n,A.n,B.n) as rk,<br />
substring(c,9+72*N.n+13*A.n+6*B.n,5) as n<br />
from<br />
Numbers(0,19999) as N,<br />
Numbers(0,4) as A,<br />
Numbers(0,1) as B,<br />
MillionDigitsFile<br />
go</p>
<p>-- How random does it look? (and a sneaky way to<br />
-- aggregate over an aggregate)<br />
select top 1<br />
min(count(*)) over (),<br />
max(count(*)) over (),<br />
avg(1.00000*count(*)) over (),<br />
stdev(count(*)) over ()<br />
from NumbersFromTable<br />
group by number<br />
go</p>
<p>/* Selects a @length-long sequence of numbers from<br />
the table, where the place to start is found as<br />
follows.  Given a random integer, use % to turn<br />
it into a number's position between 1 and 200000.<br />
Reduce that position % 20000 to find a starting<br />
line of the book, and reduce the following<br />
number % 10 to find the starting number on<br />
that line.<br />
*/<br />
create function RandomSequence(<br />
@seed int,<br />
@length int<br />
) returns table as return (<br />
select top (@length)<br />
row_number() over (order by position) as i,<br />
number<br />
from NumbersFromTable<br />
where position &gt;= (<br />
select number%20000<br />
from NumbersFromTable<br />
where 1+@seed%200000 = position<br />
) + (<br />
select number%10<br />
from NumbersFromTable<br />
where 1+(@seed+1)%200000 = position<br />
)<br />
order by position<br />
)<br />
go</p>
<p>-- Generate a few random sequences. You'll get different ones<br />
-- each time you run this.<br />
declare @seed int<br />
set @seed = abs(binary_checksum(newid()))%200000<br />
select * from RandomSequence(@seed,50)<br />
set @seed = abs(binary_checksum(newid()))%200000<br />
select * from RandomSequence(@seed,123)</p>
<p></tt></p>
<p><tt>-- Uncomment to clean up<br />
-- use master<br />
-- go<br />
-- drop database MillionDigits</tt></p>
]]></content:encoded>
			<wfw:commentRss>http://stevekass.com/2006/08/09/a-million-random-digits-with-100000-normal-deviates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to generate a sequence on the fly</title>
		<link>http://stevekass.com/2006/06/03/how-to-generate-a-sequence-on-the-fly/</link>
		<comments>http://stevekass.com/2006/06/03/how-to-generate-a-sequence-on-the-fly/#comments</comments>
		<pubDate>Sat, 03 Jun 2006 14:44:10 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://www.stevekass.com/2006/06/03/how-to-generate-a-sequence-on-the-fly/</guid>
		<description><![CDATA[One of the things that kept me busy this past winter and spring was tech editing Itzik Ben-Gan&#8217;s two books in Microsoft Press&#8217;s Inside MicrosoftÂ® SQL Serverâ„¢ 2005 series (1,2).  Of Itzik&#8217;s many clever solutions to programming problems, my favorite was this function that returns a table of consecutive integers. It&#8217;s blazingly fast, and [...]]]></description>
			<content:encoded><![CDATA[<p>One of the things that kept me busy this past winter and spring was tech editing Itzik Ben-Gan&#8217;s two books in Microsoft Press&#8217;s Inside MicrosoftÂ® SQL Serverâ„¢ 2005 series (<a href="http://www.microsoft.com/MSPress/books/9615.asp">1</a>,<a href="http://www.microsoft.com/MSPress/books/8564.asp">2</a>).  Of Itzik&#8217;s many clever solutions to programming problems, my favorite was this function that returns a table of consecutive integers. It&#8217;s blazingly fast, and it&#8217;s the best way I know of to generate a sequence on the fly &#8211; probably even better than accessing a permanent table of integers.</p>
<p><code>create function Numbers(<br />
&nbsp;&nbsp;@from as bigint,<br />
&nbsp;&nbsp;@to   as bigint<br />
) returns table with schemabinding as return<br />
&nbsp;&nbsp;with t0(n) as (<br />
&nbsp;&nbsp;&nbsp;&nbsp;select 1 union all select 1<br />
&nbsp;&nbsp;), t1(n) as (<br />
&nbsp;&nbsp;&nbsp;&nbsp;select 1 from t0 as a, t0 as b<br />
&nbsp;&nbsp;), t2(n) as (<br />
&nbsp;&nbsp;&nbsp;&nbsp;select 1 from t1 as a, t1 as b<br />
&nbsp;&nbsp;), t3(n) as (<br />
&nbsp;&nbsp;&nbsp;&nbsp;select 1 from t2 as a, t2 as b<br />
&nbsp;&nbsp;), t4(n) as (<br />
&nbsp;&nbsp;&nbsp;&nbsp;select 1 from t3 as a, t3 as b<br />
&nbsp;&nbsp;), t5(n) as (<br />
&nbsp;&nbsp;&nbsp;&nbsp;select 1 from t4 as a, t4 as b<br />
&nbsp;&nbsp;), Numbers(n) as (<br />
&nbsp;&nbsp;&nbsp;&nbsp;select row_number() over (order by n) as n<br />
&nbsp;&nbsp;&nbsp;&nbsp;from t5<br />
&nbsp;&nbsp;)<br />
&nbsp;&nbsp;&nbsp;&nbsp;select @from + n - 1 as n<br />
&nbsp;&nbsp;&nbsp;&nbsp;from Numbers<br />
&nbsp;&nbsp;&nbsp;&nbsp;where n <= @to - @from + 1<br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://stevekass.com/2006/06/03/how-to-generate-a-sequence-on-the-fly/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sorting string data: estimated and actual costs</title>
		<link>http://stevekass.com/2006/06/02/11/</link>
		<comments>http://stevekass.com/2006/06/02/11/#comments</comments>
		<pubDate>Fri, 02 Jun 2006 15:52:06 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://www.stevekass.com/2006/06/02/11/</guid>
		<description><![CDATA[Estimated row size in bytes is an important factor used by the SQL Server optimizer to estimate query cost, and Iâ€™ve found an anomaly in the estimated costing algorithm for the Sort operator, as well as in the actual cost of sorting long data.The estimated cost of a Sort seems to take a giant jump [...]]]></description>
			<content:encoded><![CDATA[<p>Estimated row size in bytes is an important factor used by the SQL Server optimizer to estimate query cost, and Iâ€™ve found an anomaly in the estimated costing algorithm for the Sort operator, as well as in the actual cost of sorting long data.The estimated cost of a Sort seems to take a giant jump when the estimated row size exceeds 4000 bytes, but that jump in estimated cost doesnâ€™t correspond to any jump in actual cost.</p>
<p>Itâ€™s important to note that the jump does not depend on the length of the sort key, but only on the length of the row data being carried along. The cost estimate for sorting a estimated-to-be-long row on a short key is much greater than for sorting an estimated-to-be-medium-length row on the same short key.&nbsp;&nbsp;<span id="more-11"></span></p>
<p>Look at the estimated plan for this batch:</p>
<p><code>select top 5 replicate(LastName,50)<br />
from AdventureWorks.Person.Contact order by reverse(EmailAddress)<br />
select top 5 replicate(LastName,100)<br />
from AdventureWorks.Person.Contact order by reverse(EmailAddress)<br />
</code></p>
<p>The same plan is used for each query:</p>
<p><code>Scan clustered index<br />
Compute replicate() and reverse()<br />
Sort/Top N on reverse() result<br />
Return results</code></p>
<p>There is a huge difference in the estimated cost of the Sort operator:</p>
<p>First query Sort operator<br />
Estimated cost: 1.320954<br />
Estimated row size: 2563<br />
Second query Sort operator<br />
Estimated cost: 226.219608<br />
Estimated row size: 4063</p>
<p>The jump from 1.3 to 226.2 occurs when the estimated row size exceeds 4000 bytes, and I still see the same jump when I change the 50 and 100 to 78 and 79.<br />
If I time these queries (I put in the TOP specification only so the output to the client doesnâ€™t dominate the timing), there is no sudden jump when the row gets to any particular size. However, in a more complex query, the goofy cost estimate for the long rows could really mess up the optimizer and cause real differences in run time by causing bad plans to be chosen.</p>
<p>I looked a little further to try to discover any anomalies in the actual cost of sorting based on key length, and I did find something interesting. Sorting data of type nvarchar(max) is much slower than sorting identical data of type nvarchar(not max), even if the data is the same size, but the optimizer does not seem to know this. In addition, the optimizer gives the same row size estimates for a replicate() result on nvarchar(max) regardless of the number of replications. This latter issue might make sense for (max) values in a table, where the row only contains a pointer and part of the data, but I donâ€™t know if it makes sense here, where the (max) data is computed.</p>
<p>Consider this repro:</p>
<p><code>set statistics time on<br />
dbcc dropcleanbuffers<br />
dbcc freeproccache<br />
go<br />
select top 5 replicate(LastName,50)<br />
from AdventureWorks.Person.Contact order by reverse(EmailAddress)<br />
go<br />
dbcc dropcleanbuffers<br />
dbcc freeproccache<br />
goselect top 5 replicate(LastName,150)<br />
from AdventureWorks.Person.Contact order by reverse(EmailAddress)<br />
go<br />
dbcc dropcleanbuffers<br />
dbcc freeproccache<br />
go<br />
select top 5 replicate(cast(LastName as nvarchar(max)),50)<br />
from AdventureWorks.Person.Contact order by reverse(EmailAddress)<br />
go<br />
dbcc dropcleanbuffers<br />
dbcc freeproccache<br />
go<br />
select top 5 replicate(cast(LastName as nvarchar(max)),150)<br />
from AdventureWorks.Person.Contact order by reverse(EmailAddress)<br />
go<br />
</code></p>
<p>The estimated row size is different for queries 1 and 2, and the 4000-byte row size threshold results in the same disparate cost estimates as before. However, the estimated row size for queries 3 and 4 are the same. Furthermore, sorting the data as nvarchar(max) takes much longer. The actual CPU times of the four queries are as follows:</p>
<p><code>47 ms<br />
110 ms<br />
1109 ms<br />
2672 ms<br />
</code></p>
<p>So sorting identical data as nvarchar(max) instead of as nvarchar takes 20 times as long, but the optimizer does not know this. Why (to both parts)?</p>
<p>This isnâ€™t something Iâ€™ve seen in production or mentioned in the newsgroups, but as the max and xml types gain more use, I wonâ€™t be surprised to see consequences both of the bad costing and the slow sorting of computed (max) data.</p>
]]></content:encoded>
			<wfw:commentRss>http://stevekass.com/2006/06/02/11/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
