Wednesday, March 7, 2012

How to delete rows from a 15 million row table

Using SQL 2000, I'm trying to trim down a 15 million row table. I initially tried simply doing a DELETE FROM TABLE WHERE DATE < '10/04/2002'. The initial start date for the table was in august so I figured this would account for about 1/3 of its contents, assuming an even daily amount was being inserted.

However, this query would need to go through the entire table to successfully perform and I had to cancel the query due to it taking too long and nearly taking down the sql server due to all the hd crunching.

Then I tried deleting it in batches...by using set rowcount = 10000, then sticking that delete statement into a nested loop and having it issue a checkpoint whenever it found a row to delete. This too, took forever just even through the first pass. Using Profiler, I watched the query get to the delete statement, then take forever again...and again I had to cancel it.

Is there a simpler way to do this, or am I missing something? Why would it take forever even for the first pass I would really like something that would export or backup the data first, then delete it.

EdHi,
y dont u try splitting ur 15 million row table into some 10 or 15 temporary tables, delete the records in the temporary tables and then group it back to the main table??,the Split-table approach??
Regards,
Ramya

Originally posted by KungFuJoe
Using SQL 2000, I'm trying to trim down a 15 million row table. I initially tried simply doing a DELETE FROM TABLE WHERE DATE < '10/04/2002'. The initial start date for the table was in august so I figured this would account for about 1/3 of its contents, assuming an even daily amount was being inserted.

However, this query would need to go through the entire table to successfully perform and I had to cancel the query due to it taking too long and nearly taking down the sql server due to all the hd crunching.

Then I tried deleting it in batches...by using set rowcount = 10000, then sticking that delete statement into a nested loop and having it issue a checkpoint whenever it found a row to delete. This too, took forever just even through the first pass. Using Profiler, I watched the query get to the delete statement, then take forever again...and again I had to cancel it.

Is there a simpler way to do this, or am I missing something? Why would it take forever even for the first pass I would really like something that would export or backup the data first, then delete it.

Ed|||I had a similar problem on an older server. There I had not enough space to make a copy of the data i had to keep on the database. So I made a stored procedure which went through a cursor that deleted 10000 rows. Then I packed the procedure into an SQL-Task in a DTS, which i scheduled for running several times in the night.|||I can't risk bringing this table down for any amount of time...this is mission critical 24/7 database that the table resides on. I need something that will work while the database is still up.

Ed

Originally posted by ramya
Hi,
y dont u try splitting ur 15 million row table into some 10 or 15 temporary tables, delete the records in the temporary tables and then group it back to the main table??,the Split-table approach??
Regards,
Ramya|||Do you have this stored procedure handy so I can look at it? I tried doing something similar (using set rowcount = ) as I described in my initial post, but I let it run for nearly 3 hours and it did not delete a single row.

Ed

Originally posted by austrian_ead
I had a similar problem on an older server. There I had not enough space to make a copy of the data i had to keep on the database. So I made a stored procedure which went through a cursor that deleted 10000 rows. Then I packed the procedure into an SQL-Task in a DTS, which i scheduled for running several times in the night.|||Originally posted by KungFuJoe
Do you have this stored procedure handy so I can look at it? I tried doing something similar (using set rowcount = ) as I described in my initial post, but I let it run for nearly 3 hours and it did not delete a single row.

Ed

I am sorry, I don't have it anymore, but the code was not complicated:

declare @.columnx integer
declare cur_test cursor for
select top 10000 0 as columnx from bigtable
for update
open cur_test
fetch cur_test into @.columnx
while @.@.fetch_status = 0
begin
delete from bigtable where current of cur_test
fetch cur_test into @.columnx
end
close cur_test
deallocate cur_test

it's like this, not exactly, it's only a draft. You should try it it with a small database for testing.|||Thanks for your help :)

I'll give it a shot.

Ed

Originally posted by austrian_ead
I am sorry, I don't have it anymore, but the code was not complicated:

declare @.columnx integer
declare cur_test cursor for
select top 10000 0 as columnx from bigtable
for update
open cur_test
fetch cur_test into @.columnx
while @.@.fetch_status = 0
begin
delete from bigtable where current of cur_test
fetch cur_test into @.columnx
end
close cur_test
deallocate cur_test

it's like this, not exactly, it's only a draft. You should try it it with a small database for testing.|||what's the secret there? bcp out the records you want to keep, alter table tbl nocheck constraint all, truncate, bcp in!|||Yeah, but he says the table is used 24/7 and it can't be down at all. It's heavy use is probably why the deletes are taking so long.

Maybe he should try turning off loging, deleting his records, and then turning it back on again and starting a new backup cycle.

Otherwise, I think his best bet is to keep deleteing them in small batches, and as separate transactions.

I don't see what he gets out of using a cursor for this.|||Originally posted by KungFuJoe
I can't risk bringing this table down for any amount of time...this is mission critical 24/7 database that the table resides on. I need something that will work while the database is still up.
Ed

What is that there is no maint window?

Also, is there an index on the column?

Have you checked out sp_lock? what processes are going on?

Is this a web based app?

And you say 3 months is a 1/3 of the data...that's a lot of data AND I'm sure it's going to continue to grow...

You seriously need to understand the growth of this monster and plan an archival strategy...what type of data btw is ancient history after that period of time...don't you track a status or something...or is like phone records?

what does your app do?|||Actually, my understanding is that the database needs to be 24/7, not the table. Besides, the poster already mentioned making several attempts to delete from this table, during which it WAS inaccessible for anything other than SELECT with READ_UNCOMMITTED isolation level.|||Originally posted by KungFuJoe
I can't risk bringing this table down for any amount of time...|||I can't risk bringing this table down for any amount of time...this is mission critical 24/7 database that the table resides on . I need something that will work while the database is still up.

Ed|||Well if it isn't the SQL Server hardcore !!!

How you dudes doing ?

Here is a novel way of removing records from a HUGE table.

You could turn Select Into / Bulk Copy on

Perform a select into non-logged operation using a where clause to filter the records into another table. If you have an index on the column with the where clause on it then should be fast.

reindex the new table.

Perfom a two sp_renames of the two tables.

If you are prepared to have you db down for the time it takes perform the two sp_renames then you are jammin'!

Got to love those non - logged op's.|||Here we go again...and how this better? You forgot about recompiles that you'll have to do...got enough time to spare?|||Sorry mate,

I never said I had this in a stored proc.

Used to work in a large datawarehouse and found this ad hoc method was useful when I wanted to avoid using a delete.

Since this was the original question - I thought it might be an alternate option if he could handle downtime for the time of 2 sp_rename's

Cheers|||Don't apologize...unless the poster comes back with more info, everything is speculative...|||I'll second Aldo2003's tip to reduce the affect of downtime and to get the optimum performance one should arrange a downtime window to accomplish the task easily and fastly.

It would be ideal if downtime can be agreed by business and will be faster when no other process is accessing the database.

When you're getting something you should be ready to sacrifice something...:)|||So, you're all saying that "SELECT * INTO..." is faster than TRUNCATE?

Anyone for a test?|||No, they aren't trying to optimize the total process time, just the table down-time. So they aren't counting the SELECT because it doesn't lock up the table.

I think they are assuming that no new data will be added to the table between the SELECT INTO NEWTABLE and the RENAME, and I bet that this is not the case.|||Actually, SELECT will placed a shared lock on the table, unless you change the isolation level. But the point is that BCPing only records that you need paired with truncating the table is faster than any other solution, and they don't seem to see it that way. I, in turn, don't see anything else as an alternative. Downtime is a downtime is a downtime. It's just a matter of how much down, and ... what's the time? It's that time, when I go a have my mid-day smoke break off campus!!!|||You have to leave the campus?

This from a state where drinking and driving where basically legal up to a short while ago?|||Well, we got this mayor Garza, all fitness-oriented (I can still probably kick his butt without getting off the bar stool), and our glorious and victorious CEO decided to come up with a "we're doing you good" initiative by declaring it on the local news. Reality (we have our ways [needs to be pronounced with a heavy Russian accent]) shows that the company got a 10% insurance reduction for making it a non-smoking campus. So, smoking, non-smoking, or smoke up someone's a$$? That's the dilemma ;)|||Have they banned smoking in bars?

Did you know, NYC (The new Rome) has banned smoking in all bars throughout the entire city...

NEW YORK F____'N CITY!

I mean give me a break...

You can buy crack on the corner but no cig...|||Technically, I believe that crack has been illegal in NYC for quite a while...|||Originally posted by Brett Kaiser
This from a state where drinking and driving where basically legal up to a short while ago? What, they've closed the drive up windows on the liquor stores in Texas ? Say it isn't so!

-PatP|||Originally posted by blindman
Technically, I believe that crack has been illegal in NYC for quite a while... Ok, so what's your point here ?

-PatP|||Originally posted by Brett Kaiser
Have they banned smoking in bars?

Did you know, NYC (The new Rome) has banned smoking in all bars throughout the entire city...

NEW YORK F____'N CITY!

I mean give me a break...

You can buy crack on the corner but no cig...

<RANT>
Not just New York City...since last summer it's the whole state...thank god we built a shack behind our building to shelter us from this wonderful weather...to make matters worse NYS also added a line to the tax forms this year to make us pay the maximum amount of tax possible..they want us to estimate goods we purchased outside our county, on reservations (where we get our cigs), and online...*&$&^ politicians...

</RANT>|||Starting tomorrow, going to the gym

How about banning McDonalds, what's up with this no supersizing $%&#?|||Originally posted by rdjabarov

Starting tomorrow, going to the gym

How about banning McDonalds, what's up with this no supersizing $%&#?

..its all about cutting the budgets...actually i think they will move supersizing overseas (not to bring that subject up AGAIN)...|||that's a good one, here they went through 3 indians, with the same outcome. and how do you like an answer to a questions "what's a trigger?"

"it's when you move your mouse, because that's what oracle objects is all about"!!!

No comments:

Post a Comment