Wednesday, March 7, 2012

Cube showing slightly different data than in data source...?

Hello,
I am very perplexed with this problem I'm having. I work for a website-based
company and my job is to create a data warehouse based on usage of the site.
We have a Central Logging Database that logs every single hit to the site,
then i have a stored procedure that picks up these hits, does a bit of data
scrubbing and transformation, and puts them into a Warehouse database. i
then have an Analysis Services cube that points at this warehouse database.
I then use Reporting Services and MDX to create reports based on this usage
info.
Scenario: the total hits in the database can be further grouped into
companies, so reports can be run to show a company's usage of the site.
Companies are then further divided into Users belonging to a company. I.e.
reports are generally run to show a company's usage, broken down into
individual users within that company.
In order to check that my warehouse-load stored procedure works properly I
compared the amount of hits in the Central Logging Database (where all hits
are originally recorded) with hits in the final Warehouse database to see if
they balance. Everything looks fine and hunky-dory.
My problem is this: When I browse the cube in analysis manager, some of the
companies have a higher hit count, i.e. show more hits, than actually exist
in any of the original databases! In the cube one company is even showing
1,000 hits for a user who does not have ANY hits in any of the original
databases!!!
the total amount of hits shown in the cube is equal to the original
databases, so some how it balances. but i don't understand where all this
extra\incorrect data is coming from.
My cube consists of a very simple Star-schema design. one main table called
Hits, and UserAccount, BusinessEntity, and Date dimensions. the Hits table
consists of the following columns:
Hit_ID (int)
DatabaseSource (int)
UserAccountID (int)
BusinessEntityID (int)
LogDateID (int)
One hit is defined by a Hit_ID and DatabaseSource (these two form a
composite primary key).
My Measure in the cube is the Hit_ID.
Does anyone know what i could possibly be doing wrong? I would be very
grateful for any bit of help anyone could provide.
Thanks in advance!
I presume your measure is a Count of HitID to have the number of hit by
users...
How many rows are loaded when you process the cube? (normally this number =
number rows in the database)
There is any join between the Hit table and the others? (joins created by AS
to load the cube; see the query executed by AS)
does the user appear twice in the database?
if your users are under the company and if 1 user can appear in more then 1
company, then you can count twice (or more) the same hit.
"Neile" <Neile@.discussions.microsoft.com> a crit dans le message de news:
E05E7B92-22C7-4811-9DDF-B1C9C25EEBD0@.microsoft.com...
> Hello,
> I am very perplexed with this problem I'm having. I work for a
> website-based
> company and my job is to create a data warehouse based on usage of the
> site.
> We have a Central Logging Database that logs every single hit to the site,
> then i have a stored procedure that picks up these hits, does a bit of
> data
> scrubbing and transformation, and puts them into a Warehouse database. i
> then have an Analysis Services cube that points at this warehouse
> database.
> I then use Reporting Services and MDX to create reports based on this
> usage
> info.
> Scenario: the total hits in the database can be further grouped into
> companies, so reports can be run to show a company's usage of the site.
> Companies are then further divided into Users belonging to a company. I.e.
> reports are generally run to show a company's usage, broken down into
> individual users within that company.
> In order to check that my warehouse-load stored procedure works properly I
> compared the amount of hits in the Central Logging Database (where all
> hits
> are originally recorded) with hits in the final Warehouse database to see
> if
> they balance. Everything looks fine and hunky-dory.
> My problem is this: When I browse the cube in analysis manager, some of
> the
> companies have a higher hit count, i.e. show more hits, than actually
> exist
> in any of the original databases! In the cube one company is even showing
> 1,000 hits for a user who does not have ANY hits in any of the original
> databases!!!
> the total amount of hits shown in the cube is equal to the original
> databases, so some how it balances. but i don't understand where all this
> extra\incorrect data is coming from.
> My cube consists of a very simple Star-schema design. one main table
> called
> Hits, and UserAccount, BusinessEntity, and Date dimensions. the Hits table
> consists of the following columns:
> Hit_ID (int)
> DatabaseSource (int)
> UserAccountID (int)
> BusinessEntityID (int)
> LogDateID (int)
> One hit is defined by a Hit_ID and DatabaseSource (these two form a
> composite primary key).
> My Measure in the cube is the Hit_ID.
> Does anyone know what i could possibly be doing wrong? I would be very
> grateful for any bit of help anyone could provide.
> Thanks in advance!
>

No comments:

Post a Comment