CS614 Current FinalTerm Paper 20 August 2016
All Questions was from Past Papers…..
37 Mcqs was from Past paper + Quizzes
37 Mcqs was from Past paper + Quizzes
Subjective Totally from Past Papers
MCQS are:
The goal of ___________ is to look at as few blocks as possible to find the matching records(s).
Indexing (Right Answer)
Partitioning
Joining
None of above
The automated, prospective analyses offered by data mining move beyond the analysis of past events provided by respective tools typical of ___________.
OLTP
OLAP
Decision Support systems
None of these
Pre-computed _______ can solve performance problems
Aggregates
Facts
Dimensions
Data mining uses _________ algorithms to discover patterns and regularities in data.
Mathematical
Computational
Statistical
None of these
To identify the __________________ required we need to perform data profiling Degree of Transformation (Right Answer)
Complexity
Cost
Time
Execution can be completed successfully or it may be stopped due to some error. If some error occurs, execution will be terminated abnormally and all transactions will be ___________ Committed to the database (Right Answer)
Rolled back
All data is ______________ of something real. I An Abstraction II A Representation Which of the following option is true?
I Only (Right Answer)
II Only
Both I & II
None of I & II
For a DWH project, the key requirement are ________ and product experience.
Tools
Industry (Right Answer)
Software
None of these
_________________ contributes to an under-utilization of valuable and expensive historical data, and inevitably results in a limited capability to provide decision support and analysis.
The lack of data integration and standardization (Right Answer)
Missing Data Data Stored in Heterogeneous Sources
DTS allows us to connect through any data source or destination that is supported by ____________ OLE DB (Right Answer)
OLAP
OLTP
Data Warehouse
If some error occurs, execution will be terminated abnormally and all transactions will be rolled back. In this case when we will access the database we will find it in the state that was before the ____________. Execution of package (Right Answer)
Creation of package
Connection of package
To judge effectiveness we perform data profiling twice.
One before Extraction and the other after Extraction
One before Transformation and the other after Transformation (Right Answer)
One before Loading and the other after Loading
Pre-computed _______ can solve performance problems
Aggregates (Right Answer)
Facts
Dimensions
De-Normalization normally speeds up ► Data Retrieval (Page 51)
► Data Modification
► Development Cycle
► Data Replication
|
For a given data set, to get a global view in un-supervised learning we use ► One-way Clustering (Page 271)
► Bi-clustering
► Pearson correlation
► Euclidean distance
|
It is called a _____________ violation, if we have null values for attributes where NOT NULL constraint exists
- Load
- Transform
- Constraint page 161
- Extraction
UAT stands for
- User acceptance testing page 193
The application development quality assurance activities cannot be completed until the data is _____________
- Stabilized page 308
- Identified
- Finalized
- Computerized
Product selection phase fall in Kimball
- Lifecycle Technology Track page 290
- Lifecycle Data Track
- Lifecycle Analytic Applications Track
- None of the given
Which is not an issue of “Click stream data”.
- Identifying the Visitor Origin
- Identifying the Session
- Identifying the Visitor
- Identifying the server .
HTTP true statement
- Is stateless page 364
- Non world wide web protocol
- Used to maintain session
- Message routing protocol
The ith bit is set to 1, if ith row of the base table has the value for the indexed column. The statement refer to
- Inverted
- Bitmap page 233
- Dense
- Sparse index
In context of web data ware house. Which is NOT one of way to identify session
- Using asynchronous session tracking protocol
- Using Time-contiguous Log Entries
- Using Transient Cookies
- Using HTTP's secure sockets layer (SSL)
- Using session ID Ping-pong
- Using Persistent Cookies
The application development quality assurance activities cannot be completed until the data is
_____________
- Stabilized page 308
- Identified
- Finalized
- Computerized
Others Mcqs from Midterm + 3, 4 was from Handouts but very easy ….
Subjective:
2 Marks Questions
There are four categories of data quality improvement. Write any two. (2 marks)
Answer:
The four categories of Data Quality Improvement
• Process
• System
• Policy & Procedure
• Data Design
Write two unsupervised learning? 2 marks
Answer:
One way clustering
Two way clustering
Statement meaning “be a Diplomat not technologist “ 2 marks
Answer: The biggest problem you will face during a warehouse implementation will be people, not the technology or the development.
1. Management: You’re going to have senior management complaining about completion dates and unclear objectives.
2. Development Team: You’re going to have development people protesting that everything takes too long and why can’t they do it the old way?
3. Users: You’re going to have users with outrageously unrealistic expectations, who are used to systems that require mouse-clicking but not much intellectual investment on their part.
4. And you’re going to grow exhausted, separating out Needs from Wants at all levels. Commit from the outset to work very hard at communicating the realities, encouraging investment, and cultivating the development of new skills in your team and your users (and even your bosses).
Most of all, keep smiling. When all is said and done, you’ll have a resource in place that will do magic, and your grief will be long past. Eventually, your smile will be effortless and real.
Define click stream? 2marks
Answer:
Clickstream is every page event recorded by each of the company's Web servers
Web-intensive businesses
Although most exciting, at the same time it can be the most difficult and most frustrating.
Not JUST another data source.
3 Marks Questions
As the number of processes increase, the speedup should also increase. Thus theoretically there should be a linear speedup; however this is not the case in real. List at least 2 barrier of linear speedup. 3 marks
Answer:
- Amdahl’ Law
- Startup
- Interference
- Skew
Common Dimensions in context with Web data warehouse. 3marks
Answer: -------
Name of three DWH development methodologies? 3 marks
Answer:
Development methodologies
Waterfall model
Spiral model
RAD Model
Structured Methodology
- Data Driven
- Goal Driven
- User Driven
One question was to identify statement is correct or not (was from Midterm) 3 marks
5 Marks Question:
Before sitting down with the business community to gather information, it is suggested to set you up for a productive session. Write three activities requirement preplanning phase 5 marks
Answer:
Requirements preplanning: This phase consists of activities like choosing the forum, identifying and preparing the requirements team and finally selecting, scheduling and preparing the business representatives.
This query was given SELECT*FROM R WHERE A= 5 and we have to tell which
Technique is appropriate from dense, sparse, B-tree and has indexing. 5 marks
Answer: Hash Indexing is appropriate for the given query, because hash indexing is good for matching queries.
According to Amdahl’s Law prove that the speedup does not remain same if the fraction of the problem and number of processors are doubled. Please note that 0 overhead and “perfect” parallelism is used. Use following examples 5 marks
a) Fraction of the problem that must be computed sequentially is 5% and number of processors is 100.
b) Fraction of the problem that must be computed sequentially is 10% and number of processors is 200.
Ans: REF (Handouts Page # 204,205)
Amdahl’s law: S ≤ 1 / f + ( 1 – f ) / N
a) 1 / 0.05 + ( 1 – 0.05 ) / 100 = 16.81
b) 1 / 0.10 + ( 1 – 0.10 ) / 100 = 9.57
Hence it is evident that the speedup does not remain same if we double the fraction and number of processors.
Attributes of Page Dimension: 5 marks
Answer: Page no : 362 Ch# 40
0 comments: