Friday, March 9, 2012

incomplete Full text index for large dbs in sql 2005?

we have created a full text catalogue on an Addresses table with exactly same
configuration on 3 databases. Two of them are on SQL 2005 and one on SQL 2000.
My Addresses table has 13 million rows and the full text was on the Full
Address column.
db is around 250GB
FT Index on SQL 2005 database was created very quickly (within 2 hours)
while SQL 2000 FT took 30 hours.
But my issue is SQL 2005 index does not contain all the words. For an
example following 2 queries gives me the exactly same number on the SQL 2000
database while on SQL 2005, first query only return 11 and the 2nd one 274
(274 is same on 2000 server/database as well)
SELECT count(*)
FROM [dbo].[Addresses]
WHERE CONTAINS(FullAddress, 'WARMINSTER')
GO
SELECT count(*)
FROM [dbo].[Addresses]
WHERE FullAddress like '%WARMINSTER%'
Is this a bug?
When i extracted the records which should have retrived in to a seperate
table (few hundred records) and created a FT index on that it works fine.
CTS DBA
I've had problems on very large databases with this. I think we were over
400,000,000 rows when we first noticed it. A reorganize would normally help.
Can you first run this query?
SELECT fulladdress FROM [dbo].[Addresses] WHERE FullAddress like
'%WARMINSTER%' and not like '% WARMINSTER %'
See if perhaps you are getting hits to Warministershireontheavon or
something like that. Note that most punctuation will be ignored.
If you are trully missing hits can you check the gatherer log to see if
there are any error messages?
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"CTS_DBA" <CTSDBA@.discussions.microsoft.com> wrote in message
news:0E0072FB-B92A-4E2C-9915-99D0667C0BCA@.microsoft.com...
> we have created a full text catalogue on an Addresses table with exactly
> same
> configuration on 3 databases. Two of them are on SQL 2005 and one on SQL
> 2000.
> My Addresses table has 13 million rows and the full text was on the Full
> Address column.
> db is around 250GB
> FT Index on SQL 2005 database was created very quickly (within 2 hours)
> while SQL 2000 FT took 30 hours.
> But my issue is SQL 2005 index does not contain all the words. For an
> example following 2 queries gives me the exactly same number on the SQL
> 2000
> database while on SQL 2005, first query only return 11 and the 2nd one 274
> (274 is same on 2000 server/database as well)
> SELECT count(*)
> FROM [dbo].[Addresses]
> WHERE CONTAINS(FullAddress, 'WARMINSTER')
> GO
> SELECT count(*)
> FROM [dbo].[Addresses]
> WHERE FullAddress like '%WARMINSTER%'
> Is this a bug?
> When i extracted the records which should have retrived in to a seperate
> table (few hundred records) and created a FT index on that it works fine.
>
> --
> CTS DBA
|||Thanks Hilary,
SELECT fulladdress FROM [dbo].[Addresses] WHERE FullAddress like
'%WARMINSTER%' and not like '% WARMINSTER %'
The Query returns only 4 records which means still so many records missing
in the SQL 2005 FT catalog. I have started the re-organize (ALTER FULLTEXT
CATALOG cat_Addresses REORGANIZE) and update you with the results when it’s
completed.
thanks
CTS DBA
"Hilary Cotter" wrote:

> I've had problems on very large databases with this. I think we were over
> 400,000,000 rows when we first noticed it. A reorganize would normally help.
> Can you first run this query?
> SELECT fulladdress FROM [dbo].[Addresses] WHERE FullAddress like
> '%WARMINSTER%' and not like '% WARMINSTER %'
> See if perhaps you are getting hits to Warministershireontheavon or
> something like that. Note that most punctuation will be ignored.
> If you are trully missing hits can you check the gatherer log to see if
> there are any error messages?
>
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
>
> "CTS_DBA" <CTSDBA@.discussions.microsoft.com> wrote in message
> news:0E0072FB-B92A-4E2C-9915-99D0667C0BCA@.microsoft.com...
>
>
|||Re-organize has been completed. But still no luck. And there is nothing in
the gatherer logs as well. See below from the gatherer logs for the entries
from re-organize and re-building I did today.
RE_BUILDING
2006-11-29 09:28:15.58 spid20s Informational: Full-text Full population
completed for table or indexed view '[CI_InvTest].[dbo].[Addresses]' (table
or indexed view ID '101575400', database ID '41'). Number of documents
processed: 10038019. Number of documents failed: 0. Number of documents need
retry: 0.
2006-11-29 09:28:15.58 spid20s Changing the status to MERGE for
full-text catalog "cat_Addresses_AI" (7) in database "CI_InvTest" (41). This
is an informational message only. No user action is required.
2006-11-29 09:28:16.89 spid20s Informational: Full-text Auto population
initialized for table or indexed view '[CI_InvTest].[dbo].[Addresses]' (table
or indexed view ID '101575400', database ID '41'). Population sub-tasks: 1.
RE-ORGANIZING
2006-11-29 09:38:21.04 spid56 Changing the status to MERGE for
full-text catalog "cat_Addresses_AI" (7) in database "CI_InvTest" (41). This
is an informational message only. No user action is required.
CTS DBA
"CTS_DBA" wrote:
[vbcol=seagreen]
> Thanks Hilary,
> SELECT fulladdress FROM [dbo].[Addresses] WHERE FullAddress like
> '%WARMINSTER%' and not like '% WARMINSTER %'
> The Query returns only 4 records which means still so many records missing
> in the SQL 2005 FT catalog. I have started the re-organize (ALTER FULLTEXT
> CATALOG cat_Addresses REORGANIZE) and update you with the results when it’s
> completed.
> thanks
>
> --
> CTS DBA
>
> "Hilary Cotter" wrote:
|||It looks like all the rows were successfully processed. My concern still is
that perhaps there are some preceding or trailing characters which might
account for these rows not showing up. Can you confirm this?
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"CTS_DBA" <CTSDBA@.discussions.microsoft.com> wrote in message
news:C30E750E-A7CD-45F0-8699-8D96E32DDFAB@.microsoft.com...[vbcol=seagreen]
> Re-organize has been completed. But still no luck. And there is nothing in
> the gatherer logs as well. See below from the gatherer logs for the
> entries
> from re-organize and re-building I did today.
> RE_BUILDING
> 2006-11-29 09:28:15.58 spid20s Informational: Full-text Full
> population
> completed for table or indexed view '[CI_InvTest].[dbo].[Addresses]'
> (table
> or indexed view ID '101575400', database ID '41'). Number of documents
> processed: 10038019. Number of documents failed: 0. Number of documents
> need
> retry: 0.
> 2006-11-29 09:28:15.58 spid20s Changing the status to MERGE for
> full-text catalog "cat_Addresses_AI" (7) in database "CI_InvTest" (41).
> This
> is an informational message only. No user action is required.
> 2006-11-29 09:28:16.89 spid20s Informational: Full-text Auto
> population
> initialized for table or indexed view '[CI_InvTest].[dbo].[Addresses]'
> (table
> or indexed view ID '101575400', database ID '41'). Population sub-tasks:
> 1.
>
>
> RE-ORGANIZING
> 2006-11-29 09:38:21.04 spid56 Changing the status to MERGE for
> full-text catalog "cat_Addresses_AI" (7) in database "CI_InvTest" (41).
> This
> is an informational message only. No user action is required.
>
> --
> CTS DBA
>
> "CTS_DBA" wrote:
|||yep, cause as mentioned earlier same DB, sql 2000 FT returns the expected
result
the other thing I'm worried is that sql 2005 took 2 hours and 2000 took
around 30 hours, huge different, and of course servers are , you can sya same
hard ware.
CTS DBA
"Hilary Cotter" wrote:

> It looks like all the rows were successfully processed. My concern still is
> that perhaps there are some preceding or trailing characters which might
> account for these rows not showing up. Can you confirm this?
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
>
> "CTS_DBA" <CTSDBA@.discussions.microsoft.com> wrote in message
> news:C30E750E-A7CD-45F0-8699-8D96E32DDFAB@.microsoft.com...
>
>
|||What if you run
SELECT fulladdress FROM [dbo].[Addresses] WHERE
FullAddress LIKE '%WARMINSTER%'
AND NOT CONTAINS(FullAddress, 'WARMINSTER')
do the fulladdress results look funny in any way, or is WARMINSTER at
the start of a word? Maybe a non-ASCII character in there somehow?
Ben Strackany
CTS_DBA wrote:[vbcol=seagreen]
> Thanks Hilary,
> SELECT fulladdress FROM [dbo].[Addresses] WHERE FullAddress like
> '%WARMINSTER%' and not like '% WARMINSTER %'
> The Query returns only 4 records which means still so many records missing
> in the SQL 2005 FT catalog. I have started the re-organize (ALTER FULLTEXT
> CATALOG cat_Addresses REORGANIZE) and update you with the results when it's
> completed.
> thanks
>
> --
> CTS DBA
>
> "Hilary Cotter" wrote:
|||This says everything processed successfully.
I am really not sure how to advice you on where to go from here.
One of my lingering concerns is that this query
SELECT fulladdress FROM [dbo].[Addresses] WHERE FullAddress like
'%WARMINSTER%' and not like '% WARMINSTER %'
returns 4 results - what are they?
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"CTS_DBA" <CTSDBA@.discussions.microsoft.com> wrote in message
news:C30E750E-A7CD-45F0-8699-8D96E32DDFAB@.microsoft.com...[vbcol=seagreen]
> Re-organize has been completed. But still no luck. And there is nothing in
> the gatherer logs as well. See below from the gatherer logs for the
> entries
> from re-organize and re-building I did today.
> RE_BUILDING
> 2006-11-29 09:28:15.58 spid20s Informational: Full-text Full
> population
> completed for table or indexed view '[CI_InvTest].[dbo].[Addresses]'
> (table
> or indexed view ID '101575400', database ID '41'). Number of documents
> processed: 10038019. Number of documents failed: 0. Number of documents
> need
> retry: 0.
> 2006-11-29 09:28:15.58 spid20s Changing the status to MERGE for
> full-text catalog "cat_Addresses_AI" (7) in database "CI_InvTest" (41).
> This
> is an informational message only. No user action is required.
> 2006-11-29 09:28:16.89 spid20s Informational: Full-text Auto
> population
> initialized for table or indexed view '[CI_InvTest].[dbo].[Addresses]'
> (table
> or indexed view ID '101575400', database ID '41'). Population sub-tasks:
> 1.
>
>
> RE-ORGANIZING
> 2006-11-29 09:38:21.04 spid56 Changing the status to MERGE for
> full-text catalog "cat_Addresses_AI" (7) in database "CI_InvTest" (41).
> This
> is an informational message only. No user action is required.
>
> --
> CTS DBA
>
> "CTS_DBA" wrote:

No comments:

Post a Comment