Unit 6 - Practice Quiz

INT428 60 Questions
0 Correct 0 Wrong 60 Left
0/60

1 What is the primary function of a tool like Tableau?

Data analysis and visualization using AI tools (ChatGPT Advanced Data Analysis, Tableau) Easy
A. To store large amounts of unstructured data
B. To train complex neural networks from scratch
C. To create interactive data visualizations and dashboards
D. To write and compile computer code

2 Which of the following is a classic example of structured data?

Working with structured and unstructured data Easy
A. A table of customer information in a SQL database
B. Audio recordings from a call center
C. A collection of customer review emails
D. A folder of images from a security camera

3 Which of these data types is considered unstructured?

Working with structured and unstructured data Easy
A. Video files
B. A database of student grades
C. An Excel spreadsheet with employee IDs
D. A CSV file with sales figures

4 What is the main purpose of a data pipeline?

Data pipelines and automation Easy
A. To exclusively visualize data
B. To create AI models
C. To move data from a source to a destination, often with transformations
D. To store data backups

5 What is a major advantage of using cloud services like AWS or Azure for training AI models?

AI Model Environments & Lifecycle Basics: Cloud services Easy
A. It is always free to use
B. Access to powerful computing resources on-demand
C. It guarantees the model will be 100% accurate
D. It does not require an internet connection

6 What does edge deployment for an AI model mean?

AI Model Environments & Lifecycle Basics: Edge deployment Easy
A. Running the model on a central cloud server
B. Storing the model on a cutting-edge hard drive
C. Training the model on multiple computers simultaneously
D. Running the model directly on a local device like a smartphone or sensor

7 What does the term MLOps primarily refer to?

Introduction to MLOps and lifecycle management Easy
A. A programming language for statistics
B. A set of practices for collaboration and communication between data scientists and IT professionals
C. A brand of computer hardware for AI
D. A new type of machine learning algorithm

8 In machine learning, what is overfitting?

Error identification Easy
A. When the dataset is too small to use
B. When a model is too simple to capture the underlying data patterns
C. When a model performs poorly on both training and new data
D. When a model performs very well on training data but poorly on new data

9 What is the primary goal of AI process automation?

AI process automation Easy
A. To use AI to perform repetitive tasks previously done by humans
B. To analyze stock market trends exclusively
C. To replace all human jobs with robots
D. To create art and music using AI

10 What is often considered the first step in troubleshooting a problem with an AI model?

Troubleshooting Easy
A. Adding more data to the training set
B. Changing the model's algorithm
C. Immediately deleting the model and starting over
D. Identifying and understanding the specific problem or error

11 When using a tool like ChatGPT's Advanced Data Analysis, what kind of input do you typically provide to start an analysis?

Data analysis and visualization using AI tools (ChatGPT Advanced Data Analysis, Tableau) Easy
A. A natural language prompt describing the task and a data file
B. A connection to a live-streaming database
C. Complex Python code
D. A pre-trained neural network

12 What does automation in a data pipeline help to reduce?

Data pipelines and automation Easy
A. The amount of data being processed
B. The number of data sources
C. The need for manual intervention and human error
D. The complexity of the data

13 Which of these applications is a good candidate for edge deployment?

AI Model Environments & Lifecycle Basics: Edge deployment Easy
A. A real-time object detection feature on a smartphone camera
B. Analyzing a decade of a company's financial records
C. A massive climate change simulation model
D. Training a large language model like GPT-4

14 Which stage of the AI model lifecycle involves putting a trained model into a live environment to make predictions?

Introduction to MLOps and lifecycle management Easy
A. Data collection
B. Deployment
C. Feature engineering
D. Model training

15 The term for using a network of remote servers hosted on the Internet to store, manage, and process data, rather than a local server, is called:

AI Model Environments & Lifecycle Basics: Cloud services Easy
A. Local Hosting
B. Cloud Computing
C. Edge Computing
D. Personal Computing

16 What is a syntax error in a computer program?

Error identification Easy
A. An error caused by a user providing invalid input
B. An error in the code that violates the rules of the programming language
C. An error that occurs only when the program is out of memory
D. An error where the program runs but produces incorrect results

17 In software, what does debugging refer to?

Troubleshooting Easy
A. The process of writing new features for an application
B. The process of designing the user interface
C. The process of deploying the application to a server
D. The process of finding and fixing errors or 'bugs' in code

18 What is the key characteristic of structured data?

Working with structured and unstructured data Easy
A. It can only be text
B. It has a predefined format and a fixed schema
C. It is always stored in PDF files
D. It has no internal structure

19 After an AI model is deployed, what is a critical MLOps practice to ensure it continues to perform well?

Introduction to MLOps and lifecycle management Easy
A. Monitoring and maintenance
B. Never updating the model
C. Deleting the training data
D. Hiding the model's predictions from users

20 A business wants to automatically categorize incoming customer support emails into 'Urgent', 'Billing Question', or 'General Inquiry'. This is an example of:

AI process automation Easy
A. Edge deployment
B. Hardware troubleshooting
C. Data visualization
D. AI process automation

21 A business analyst has a 500MB CSV file of sales data and wants to quickly explore potential correlations, generate summary statistics, and create a few initial plots without writing any code. Which tool would be most efficient for this initial exploratory data analysis task?

Data analysis and visualization using AI tools (ChatGPT Advanced Data Analysis, Tableau) Medium
A. Tableau by connecting to the data source and manually dragging and dropping fields to create worksheets.
B. ChatGPT Advanced Data Analysis by uploading the file and using natural language prompts.
C. Writing a custom Python script using the Pandas and Matplotlib libraries.
D. Importing the data into a SQL database and writing complex queries.

22 An AI system is designed to analyze customer support tickets. Each ticket contains the customer's name (text), a priority level (low, medium, high), the date of submission (timestamp), and a free-text description of the problem. How should this data be categorized?

Working with structured and unstructured data Medium
A. Entirely unstructured data because it contains free-text.
B. A mix of structured (name, priority, date) and unstructured (problem description) data.
C. Entirely structured data because it's all stored in a database.
D. Primarily time-series data due to the submission date.

23 In an automated data pipeline for a machine learning model, what is the primary purpose of the 'Data Validation' stage that typically follows data ingestion?

Data pipelines and automation Medium
A. To check if the incoming data meets certain quality and schema expectations before processing.
B. To convert raw data into features for the model (e.g., normalization).
C. To train the machine learning model on the new data.
D. To store the raw data in a data lake or warehouse.

24 A hospital is developing an AI tool to assist surgeons by providing real-time analysis of a video feed from a laparoscopic camera during an operation. The system must have minimal latency (delay) to be effective. What is the most appropriate deployment environment for this AI model?

AI Model Environments & Lifecycle Basics: Cloud services, Edge deployment Medium
A. Hybrid deployment where data is sent to the cloud for processing and results are sent back.
B. Edge deployment on the surgical equipment itself.
C. Cloud deployment on a high-performance server in a remote data center.
D. Batch processing on a local server after the surgery is complete.

25 A machine learning model is trained to predict customer churn. During evaluation, the model's accuracy on the training dataset is 98%, but its accuracy on a new, unseen test dataset is only 60%. This significant performance gap is a classic sign of:

Error identification, Troubleshooting Medium
A. Underfitting
B. Data leakage
C. Class imbalance
D. Overfitting

26 What is the primary role of a 'feature store' in an MLOps framework?

Introduction to MLOps and lifecycle management Medium
A. To log the performance metrics of models in production.
B. To store the final, trained machine learning models.
C. To provide a centralized repository for storing, retrieving, and managing curated features for model training and serving.
D. To orchestrate the entire data pipeline from ingestion to deployment.

27 A company wants to automate the process of categorizing incoming customer support emails into 'Billing', 'Technical Issue', or 'General Inquiry' before they are assigned to an agent. Which AI technology is best suited for this task?

AI process automation Medium
A. Anomaly detection for finding outliers.
B. Robotic Process Automation (RPA) for mimicking UI clicks.
C. Computer Vision for image recognition.
D. Natural Language Processing (NLP) for text classification.

28 When preparing unstructured text data, such as movie reviews, for a sentiment analysis model, a common preprocessing step is 'vectorization'. What does this process accomplish?

Working with structured and unstructured data Medium
A. It summarizes the entire text into a single sentence.
B. It stores the text in a highly compressed format to save space.
C. It converts the text into a numerical representation (vectors) that a machine learning model can understand.
D. It corrects all spelling and grammar mistakes in the text.

29 A large e-commerce company trains its product recommendation model weekly on terabytes of new user interaction data. Why is a cloud environment better suited for this task than an on-premise server?

AI Model Environments & Lifecycle Basics: Cloud services, Edge deployment Medium
A. On-premise servers are incapable of handling terabytes of data.
B. Cloud environments are inherently more secure than any on-premise solution.
C. Cloud services guarantee lower latency for model inference for all users globally.
D. Cloud services offer elastic scalability, allowing the company to provision powerful computing resources (like many GPUs/TPUs) for the training period and then scale them down to save costs.

30 You are tasked with creating a highly interactive, public-facing dashboard that allows users to filter data by region, date range, and product category. The dashboard must be embeddable in a website and handle live data connections. Which tool is designed for this specific purpose?

Data analysis and visualization using AI tools (ChatGPT Advanced Data Analysis, Tableau) Medium
A. Microsoft Excel
B. Tableau
C. A Jupyter Notebook with static plots
D. ChatGPT Advanced Data Analysis

31 Within the MLOps lifecycle, what is the primary purpose of 'model monitoring' after a model has been deployed?

Introduction to MLOps and lifecycle management Medium
A. To continuously retrain the model with new data every few seconds.
B. To detect performance degradation, data drift, or concept drift in the production environment.
C. To A/B test different versions of the model's user interface.
D. To keep a version-controlled history of the model's source code.

32 An AI model for predicting house prices is found to have high bias. What is the most likely symptom of this problem?

Error identification, Troubleshooting Medium
A. The model's predictions fluctuate wildly with small changes in the input data.
B. The model takes an excessively long time to train.
C. The model performs perfectly on the training data but fails miserably on the test data.
D. The model performs poorly on both the training data and the test data, consistently making large errors.

33 In the context of data pipelines, what is the concept of 'idempotency'?

Data pipelines and automation Medium
A. The pipeline can process both structured and unstructured data simultaneously.
B. Running the pipeline multiple times with the same input will always produce the same output, without causing unintended side effects.
C. The pipeline runs on a predefined schedule, such as once every 24 hours.
D. The pipeline automatically scales its resources based on the volume of data.

34 Which of the following scenarios is a better fit for traditional Robotic Process Automation (RPA) rather than a more complex AI-based automation solution?

AI process automation Medium
A. Copying data from a specific cell in an Excel sheet and pasting it into a fixed field in a web-based form.
B. Reading a handwritten doctor's note and summarizing the key points.
C. Determining the overall sentiment (positive/negative) of a customer's email.
D. Forecasting next quarter's sales based on historical data and market trends.

35 A data science team has two versions of a fraud detection model. They want to test which one performs better on live traffic without fully replacing the old model. They decide to route 10% of user requests to the new model and 90% to the old one. This deployment strategy is known as:

Introduction to MLOps and lifecycle management Medium
A. Blue-Green Deployment
B. Shadow Deployment
C. A/B Testing
D. Canary Deployment

36 An AI model is being built to extract information from scanned PDF invoices. What is a primary challenge that distinguishes this task from analyzing a simple text file?

Working with structured and unstructured data Medium
A. The text in a PDF is always perfectly clean and requires no preprocessing.
B. PDF invoices contain only structured data, which is difficult to parse.
C. The model must understand the spatial layout and structure (e.g., tables, key-value pairs) of the document, not just the raw text.
D. PDF files cannot be read by programming languages.

37 During the exploratory data analysis phase, you discover that the 'price' column in your dataset, which should be numerical, contains values like '$1,200.50' and '950 USD'. If you try to feed this data directly into a regression model, what type of error will most likely occur?

Error identification, Troubleshooting Medium
A. An overfitting error due to the high variance in price.
B. A logical error where the model produces negative price predictions.
C. A data type error, as the model expects a numeric type but receives a string.
D. A data leakage error from the currency symbols.

38 A key component of a robust, automated ML training pipeline is data and model versioning. Why is it crucial to version not just the code, but also the data used for training?

Data pipelines and automation Medium
A. To automatically encrypt the dataset for security.
B. To speed up the data loading process during training.
C. To reduce the storage space required for the dataset.
D. To ensure reproducibility, allowing you to recreate a specific model by using the exact same code and data it was trained on.

39 What is a major drawback of edge deployment compared to cloud deployment for AI models?

AI Model Environments & Lifecycle Basics: Cloud services, Edge deployment Medium
A. Difficulty in scaling to serve millions of users simultaneously.
B. Higher costs associated with paying for on-demand cloud computing resources.
C. Higher network latency and dependence on internet connectivity.
D. Limited computational power, memory, and energy on edge devices, making it difficult to run large, complex models.

40 A user provides Tableau with a dataset containing 'State', 'City', and 'Sales' data. Tableau automatically recognizes that 'State' and 'City' are geographical data types and suggests plotting them on a map. This feature is an example of:

Data analysis and visualization using AI tools (ChatGPT Advanced Data Analysis, Tableau) Medium
A. Automated data type inference and semantic recognition.
B. Natural Language Processing (NLP) of the column headers.
C. A manually programmed rule for all columns named 'State' or 'City'.
D. AI-powered predictive modeling.

41 An MLOps team is managing a credit risk model for a bank. They detect that the model's predictions are systematically drifting for a specific demographic group, indicating potential fairness issues. The model is retrained automatically every month on new data. What is the most robust MLOps strategy to address this specific type of concept drift?

Introduction to MLOps and lifecycle management Hard
A. Roll back to the previous model version and halt the automatic retraining pipeline until the data distribution stabilizes.
B. Increase the retraining frequency to weekly to adapt to the new data distribution more quickly.
C. Implement stratified retraining batches that ensure consistent demographic representation in every training cycle and add a fairness constraint to the model's loss function.
D. Trigger an alert for manual review by the data science team whenever the model's accuracy drops below a predefined threshold.

42 A company is deploying a real-time object detection model on a fleet of 10,000 battery-powered drones. The key constraints are inference latency (< 50ms) and power consumption. The original model is a large TensorFlow FP32 model. Which model optimization strategy represents the most sophisticated and effective approach for this specific edge scenario?

AI Model Environments & Lifecycle Basics: Edge deployment Hard
A. Prune the model by 50% to reduce its size and then use post-training static quantization with a representative dataset of images from the drones.
B. Convert the model to TensorFlow Lite (FP32) and deploy it, relying on the hardware's GPU for acceleration.
C. Implement Quantization-Aware Training (QAT) to retrain the model, simulating INT8 quantization during training, and then deploy the resulting model.
D. Use post-training dynamic range quantization to convert weights to INT8, as it requires no representative dataset and is simple to implement.

43 A data engineer is designing a data pipeline that processes financial transactions. The pipeline has a critical step that aggregates transactions and writes the summary to a database. If the pipeline fails after this step and is re-run, it must not create duplicate summaries or incorrect aggregates. Which property is essential for this specific step?

Data pipelines and automation Hard
A. Idempotency
B. Observability
C. Latency
D. Scalability

44 During the training of a Generative Adversarial Network (GAN), the generator's loss drops to near zero while the discriminator's loss remains high and erratic. The generated images are all very similar and lack diversity. This phenomenon is best described as:

Error identification, Troubleshooting Hard
A. Vanishing Gradients
B. Exploding Gradients
C. Overfitting
D. Mode Collapse

45 An analyst uses ChatGPT's Advanced Data Analysis to analyze a sales dataset. They ask it to "Identify the top 3 product categories by profit margin and visualize the result." The AI generates a bar chart showing 3 categories. However, when the analyst manually calculates the profit margin using the formula , they find a different set of top 3 categories. What is the most likely cause of this discrepancy originating from the AI's process?

Data analysis and visualization using AI tools (ChatGPT Advanced Data Analysis, Tableau) Hard
A. The AI's Python execution environment had a floating-point precision error that miscalculated the division for certain categories.
B. The dataset contained null or zero values in the 'Sales' column for some rows, and the AI's default data cleaning step dropped these rows, skewing the calculation.
C. The AI is non-deterministic and hallucinated the results without performing the actual calculation.
D. The AI misinterpreted "profit margin" and calculated "total profit" instead, as this is a more common and simpler metric.

46 A team is building a recommendation engine for an e-commerce site. They have structured data (user purchase history, product ratings) and unstructured data (text from product reviews). To create a hybrid model, they generate embeddings from the review text using a transformer model. What is the most significant challenge when combining these text-based embeddings with the structured user/item features?

Working with structured and unstructured data Hard
A. Transformer models for text embeddings are too slow for real-time recommendation systems and cannot be combined with structured data.
B. The high dimensionality of text embeddings (e.g., 768 dimensions for BERT) can dominate the lower-dimensional structured features, making the model insensitive to purchase history or ratings.
C. Structured data cannot be normalized to the same scale as text embeddings, leading to training instability.
D. It is impossible to concatenate feature vectors of different data types (numerical and text-based) into a single input for a machine learning model.

47 What is the primary motivation for using a dedicated Feature Store in a mature MLOps organization with multiple data science teams?

Introduction to MLOps and lifecycle management Hard
A. To serve as a version control system for model weights and artifacts, similar to Git.
B. To provide a centralized location for data scientists to visualize and explore raw data before feature engineering.
C. To prevent training-serving skew by ensuring the exact same feature engineering logic is used during both model training and real-time inference.
D. To automate the process of hyperparameter tuning for all models in the organization.

48 A startup is training a large language model on a custom dataset. They need to perform distributed training across multiple GPUs to reduce training time. They are considering different cloud services. Which of the following describes the most complex challenge they will face that is specific to distributed training in the cloud?

AI Model Environments & Lifecycle Basics: Cloud services Hard
A. Installing the correct version of deep learning frameworks like PyTorch or TensorFlow on the cloud instances.
B. Setting up a secure network connection (VPC) to protect the training data from unauthorized access.
C. Managing inter-node communication bandwidth and latency, which can become a bottleneck and diminish the returns of adding more nodes.
D. Provisioning a single virtual machine with a sufficiently powerful GPU to handle the model's memory requirements.

49 A company wants to automate its invoice processing. The process involves receiving invoices as PDFs via email, extracting fields like invoice number, date, and total amount, and then entering this data into an ERP system. The invoices come from hundreds of different vendors, each with a unique template. Why would a traditional OCR + template-based RPA solution be inferior to an AI-powered Intelligent Document Processing (IDP) solution for this task?

AI process automation Hard
A. An IDP solution uses natural language understanding and computer vision to identify fields contextually, making it robust to variations in invoice templates without needing a new template for each vendor.
B. A traditional RPA solution cannot interact with web-based ERP systems, whereas an IDP solution has native API connectors.
C. Standard OCR cannot read text from PDF documents, requiring an AI-based solution to digitize the text first.
D. RPA bots are not capable of performing conditional logic (if/then statements), which is required to validate the extracted invoice data.

50 In the context of a streaming data pipeline using a technology like Apache Kafka, what is the primary challenge associated with ensuring 'exactly-once' processing semantics?

Data pipelines and automation Hard
A. Achieving high enough throughput to process messages as they arrive without creating a backlog in the Kafka topics.
B. Coordinating distributed transactions between the streaming processor (e.g., Flink, Spark) and the output data sink (e.g., a database) to handle both processing failures and network failures without data duplication or loss.
C. Ensuring that messages produced to Kafka are correctly serialized and deserialized by all consumers in the pipeline.
D. Encrypting the data in transit between Kafka brokers and consumers to meet security compliance requirements.

51 A classification model exhibits high accuracy (98%) on a test set, but a confusion matrix reveals it performs very poorly on the minority class (e.g., high false negatives for a 'fraud' class). When plotted, the ROC curve shows an AUC of 0.95. Why is the high AUC score misleading in this scenario?

Error identification, Troubleshooting Hard
A. The test set was contaminated with data from the training set, artificially inflating all performance metrics including AUC.
B. The AUC calculation is mathematically incorrect when the number of positive samples is less than 10% of the total dataset.
C. A high AUC score only indicates that the model's predictions are well-calibrated, not that it has good discriminative power between classes.
D. AUC is sensitive to the class imbalance; it's calculated by integrating over all possible classification thresholds, and with a large number of easy-to-classify true negatives, the curve can be pulled up and to the left, masking poor performance on the small positive class.

52 You are creating a dashboard in Tableau to analyze customer churn. You want to display the churn rate for different customer segments, but also allow users to see how the churn rate would change if a hypothetical marketing intervention reduced churn by 15% for a user-selected segment. Which combination of Tableau features would be most effective for creating this interactive, what-if analysis?

Data analysis and visualization using AI tools (ChatGPT Advanced Data Analysis, Tableau) Hard
A. A Story to create a sequence of dashboards, with one showing the actual churn and the next showing the manually calculated hypothetical churn.
B. A Quick Filter for segment selection and an AI-powered Forecasting model to project the future churn rate.
C. A Level of Detail (LOD) expression to fix the churn rate at the segment level and a data blend from a separate spreadsheet containing the reduction percentages.
D. A Parameter for segment selection, a Parameter for the churn reduction percentage, and a Calculated Field that uses these parameters to compute the hypothetical churn rate.

53 A key challenge in federated learning, where models are trained on decentralized edge devices (e.g., mobile phones) without data leaving the device, is the 'non-IID' (non-independent and identically distributed) nature of the data. What is the most severe consequence of this non-IID data distribution?

AI Model Environments & Lifecycle Basics: Edge deployment Hard
A. The global model, aggregated from the local models, can diverge or converge to a poor-performing minimum because the weight updates from different devices pull the model in conflicting directions.
B. The edge devices may not have enough computational power to train the local model effectively.
C. It becomes impossible to ensure data privacy as the model updates inherently leak information about the local data.
D. The communication cost of sending model updates from the edge devices to the central server becomes prohibitively expensive.

54 In a CI/CD/CT (Continuous Training) pipeline for MLOps, what is the most appropriate trigger for automatically initiating a full model retraining job?

Introduction to MLOps and lifecycle management Hard
A. A statistical monitoring tool detects significant 'concept drift' where the statistical properties of the live input data have diverged from the training data distribution.
B. A fixed schedule, such as the first day of every month, to ensure the model is always up-to-date.
C. The model's predictive accuracy on the live inference data drops by more than 20% from its initial baseline.
D. A software engineer commits a change to the model's inference API code in the Git repository.

55 You need to process a dataset of one million 100-page PDF documents to extract specific clauses for a legal AI system. The goal is to perform this task efficiently. Which of the following approaches represents the most scalable and computationally efficient architecture?

Working with structured and unstructured data Hard
A. A single, powerful server with multiple CPU cores that iterates through each PDF, using a multithreaded Python script to process documents in parallel.
B. A distributed processing pipeline using a framework like Apache Spark, where each worker node processes a subset of PDFs. Each worker uses an OCR library to extract text and a pre-trained transformer model (running on a GPU if available on the worker) for clause identification.
C. A serverless architecture where each PDF upload triggers a cloud function (e.g., AWS Lambda). The function performs OCR and clause extraction for that single document.
D. Manually loading the PDFs into a specialized document analysis desktop application and using its built-in tools to extract the clauses, saving the results to a CSV file.

56 An organization is using Apache Airflow to orchestrate its daily ETL pipelines. A critical DAG (Directed Acyclic Graph) has a task that depends on a file arriving from an external partner in an S3 bucket. The file can arrive at any time between 2 AM and 5 AM. Which Airflow component is the most appropriate and efficient for handling this specific dependency?

Data pipelines and automation Hard
A. Use a TriggerDagRunOperator in a separate 'poller' DAG that runs every minute to check for the file and then trigger the main DAG.
B. Write a Python function with a while True: loop and a time.sleep(60) call inside a PythonOperator to check for the file's existence.
C. Use a Sensor, specifically the S3KeySensor, which will periodically check for the existence of the file and only succeed when the file is found, allowing downstream tasks to run.
D. Run the DAG on a fixed schedule at 5:05 AM and assume the file has arrived. If it hasn't, the task will fail and the on-call engineer will be paged to re-run it manually.

57 A financial services company is deploying a fraud detection model using a serverless inference endpoint on a major cloud provider (e.g., AWS SageMaker Serverless Inference, Google Vertex AI). What is the primary trade-off they must consider when choosing a serverless endpoint over a traditional, provisioned endpoint (a dedicated, always-on VM)?

AI Model Environments & Lifecycle Basics: Cloud services Hard
A. They trade the support for deep learning models for exclusive support for traditional machine learning models like logistic regression.
B. They trade higher security and network isolation for the convenience of a publicly accessible API endpoint.
C. They trade lower cost for infrequent traffic and automatic scaling for potentially higher 'cold start' latency on the first request after a period of inactivity.
D. They trade the ability to use custom Docker containers for a simplified, no-code deployment process.

58 A hospital wants to use AI to automate the preliminary reading of chest X-rays to flag urgent cases for radiologists. The AI model has a 95% accuracy in identifying a specific condition. However, the legal and ethical implications of a misdiagnosis are severe. Which AI process automation design pattern is most appropriate for this high-stakes scenario?

AI process automation Hard
A. A fully automated 'straight-through-processing' pattern, where the AI's positive predictions are immediately sent to the emergency department to save time.
B. An 'A/B testing' pattern, where 50% of X-rays are processed by the AI and 50% by radiologists to compare performance over time.
C. A 'Robotic Process Automation (RPA)' pattern, where a bot simply moves the X-ray files from one folder to another based on the AI model's output score.
D. A 'Human-in-the-loop' pattern, where the AI flags potential cases, but every single prediction (positive or negative) is reviewed and confirmed by a certified radiologist before any action is taken.

59 When deploying a new version of a customer-facing recommendation model, an MLOps team decides to use a 'Canary Release' strategy instead of a simple A/B test. What is the primary advantage of a Canary Release in this context?

Introduction to MLOps and lifecycle management Hard
A. It automatically rolls back the deployment if the new model's inference latency exceeds a predefined threshold, prioritizing system stability over model accuracy.
B. It allows for a gradual rollout of the new model to a small subset of users (e.g., 1%), minimizing the potential negative impact (the 'blast radius') if the new model has unforeseen issues, while monitoring its performance closely before a full rollout.
C. It ensures that the new model is only served to internal employees and beta testers before being released to the general public.
D. It allows for a statistically rigorous comparison of the new model against the old model by randomly assigning users to two equally sized groups, ensuring the results are not biased.

60 A neural network model for time-series forecasting is consistently underperforming, with predictions that seem to lag behind the actual data by one time step. The model architecture is a standard LSTM network. What is the most probable cause of this specific 'lagging' behavior?

Error identification, Troubleshooting Hard
A. The time-series data was not properly made stationary before being fed into the model, and the model is simply learning to predict the last observed value () as the forecast for the next step ().
B. The learning rate is too low, causing the model to converge very slowly and fail to capture the dynamic patterns in the data.
C. Data leakage, where the model was inadvertently trained to predict the next time step's value by using that same value as a feature (e.g., predicting using a feature set that includes ).
D. The model is suffering from vanishing gradients due to the long sequences, preventing it from learning long-term dependencies.