Sample Questions
Q) You are using MADlib for Linear Regression analysis. Which value does the statement return?SELECT (linregr(depvar, indepvar)).r2 FROM zeta1;
a) Goodness of fit
b) Coefficients
c) Standard error
d) P-value
Q) Which data asset is an example of quasi-structured data?
a) Webserver log
b) XML data file
c) Database table
d) News article
Q) What would be considered "Big Data"?
a) An OLAP Cube containing customer demographic information about 100, 000, 000 customers
b) Daily Log files from a web server that receives 100, 000 hits per minute
c) Aggregated statistical data stored in a relational database table
d) Spreadsheets containing monthly sales data for a Global 100 corporation
Q) A data scientist plans to classify the sentiment polarity of 10, 000 product reviews collected from the Internet. What is the most appropriate model to use? Suppose labeled training data is available.
a) Na¯ve Bayesian classifier
b) Linear regression
c) Logistic regression
d) K-means clustering
Q) In which lifecycle stage are test and training data sets created?
a) Model building
b) Model planning
c) Discovery
d) Data preparation