Unit 4 - Notes

CSE227 12 min read

Unit 4: AI Integration in Android Applications

1. Introduction to AI in Mobile Apps

1.1 What is AI in the Mobile Context?

Artificial Intelligence (AI) in mobile applications refers to the simulation of human intelligence in apps running on mobile devices. This involves enabling apps to learn, reason, problem-solve, perceive, and understand language. AI can be implemented in two primary ways:

  • On-Device AI: Models run directly on the user's phone or tablet.

    • Pros: Low latency, works offline, enhanced data privacy (data doesn't leave the device).
    • Cons: Limited by device's processing power and memory, larger app size due to model inclusion.
    • Examples: Live text recognition in a camera app, predictive keyboard suggestions.
    • Frameworks: TensorFlow Lite, ML Kit, Core ML (iOS).
  • Cloud-Based AI: Models run on powerful remote servers, and the app communicates with them via an API.

    • Pros: Access to much larger, more powerful models; app size is smaller; models can be updated without an app update.
    • Cons: Requires an internet connection, introduces latency, potential data privacy concerns, may incur costs per API call.
    • Examples: Complex chatbots (ChatGPT/Gemini), cloud vision analysis, voice-to-text transcription services.

1.2 Common Use Cases in Mobile Apps

AI is transforming user experiences across various app categories:

  • Natural Language Processing (NLP):
    • Chatbots & Virtual Assistants: Providing customer support or conversational interfaces.
    • Sentiment Analysis: Analyzing user reviews or social media mentions.
    • Language Translation: Real-time text and speech translation.
  • Computer Vision:
    • Image Recognition: Identifying objects, people, and places in photos (e.g., Google Photos).
    • Optical Character Recognition (OCR): Extracting text from images.
    • Augmented Reality (AR): Overlaying digital information onto the real world (e.g., AR filters).
  • Personalization:
    • Recommendation Engines: Suggesting products, movies, or music (e.g., Netflix, Spotify).
    • Personalized Content Feeds: Curating news or social media feeds based on user behavior.
  • Predictive Features:
    • Predictive Text: Suggesting the next word or phrase as you type.
    • User Behavior Prediction: Anticipating user actions to streamline workflows.

2. Role of AI Assistants in Android

AI assistants have evolved from simple voice-command processors to deeply integrated, context-aware partners within the Android ecosystem.

  • Google Assistant: The primary AI assistant on Android. It leverages Google's vast knowledge graph to provide information, control device settings, and perform actions.
  • App Actions: These are a key integration point for developers. App Actions allow users to launch specific features within your app using voice commands to Google Assistant.
    • Example: "Hey Google, order a pizza from ExampleApp." This command can deep-link directly into the ordering flow of "ExampleApp."
  • The Shift to Generative AI: Newer assistants are moving beyond predefined commands. They leverage Large Language Models (LLMs) like Gemini to have more natural, multi-turn conversations and perform more complex, reasoned tasks. This enables assistants to help with creative writing, trip planning, code generation, and more, directly on the device.
  • On-Device Capabilities: With models like Gemini Nano, powerful AI assistant features can run directly on-device, offering faster and more private interactions for tasks like summarizing text or generating smart replies.

3. Overview of ChatGPT (OpenAI) and Gemini (Google)

These are two of the leading Large Language Models (LLMs) available for integration via APIs.

3.1 ChatGPT (OpenAI)

  • Developer: OpenAI.
  • Architecture: Based on the Generative Pre-trained Transformer (GPT) series (e.g., GPT-3.5, GPT-4).
  • Key Strengths: Renowned for its strong conversational abilities, creative text generation, and fluency. It excels at tasks like writing essays, generating code snippets, and role-playing scenarios.
  • Access: Accessed via the OpenAI REST API.
  • Models: Offers various models with different capabilities and costs (e.g., gpt-4o, gpt-4-turbo, gpt-3.5-turbo).

3.2 Gemini (Google)

  • Developer: Google.
  • Architecture: A new generation of models built from the ground up to be multimodal.
  • Key Strengths:
    • Multimodality: Natively understands and processes information from different formats simultaneously—text, images, audio, and video. You can, for example, show it a picture of ingredients and ask for a recipe.
    • Advanced Reasoning: Designed for complex problem-solving and logical deduction.
    • Integration: Deeply integrated with the Google ecosystem and optimized for various platforms.
  • Access: Accessed via the Google AI (Gemini) API or through Google Cloud's Vertex AI platform.
  • Models:
    • Gemini Ultra: The most powerful model for highly complex tasks.
    • Gemini Pro: The best all-around model for a wide range of tasks (this is what is typically used via the API).
    • Gemini Nano: A highly efficient model designed to run on-device on supported Android phones.

4. Basics of AI APIs: REST APIs and API Keys

4.1 REST APIs

A REST (REpresentational State Transfer) API is a standardized way for computer systems to communicate over the internet. When integrating a cloud-based AI model, you are using its REST API.

  • Core Concepts:
    • Client-Server: Your Android app is the client, and the AI service (e.g., Google's servers) is the server.
    • Stateless: Each request from the client to the server must contain all the information needed to understand and process the request. The server does not store any client context between requests (though conversational APIs can manage history).
    • HTTP Methods: You use standard HTTP methods. For generative AI, you will almost always use POST, as you are sending data (the prompt) to the server to create a new resource (the response).
    • Endpoint: A specific URL where the API can be accessed. For example: https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent
  • JSON (JavaScript Object Notation): The standard data format for REST APIs. It's human-readable and easy for machines to parse.

Example AI API Request Body (JSON):

JSON
{
  "contents": [
    {
      "parts": [
        {
          "text": "Write a short, funny poem about an Android developer."
        }
      ]
    }
  ]
}

Example AI API Response Body (JSON):

JSON
{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "An dev with a screen, bright and keen,\nWrote code for a widget, so clean.\nBut a bug, small and sly,\nMade the pixels all cry,\n'Til Stack Overflow saved the whole scene."
          }
        ],
        "role": "model"
      },
      ...
    }
  ]
}

4.2 API Keys

An API key is a unique string of characters used to authenticate your application with an API service.

  • Purpose:

    • Authentication: Proves that your app has permission to access the service.
    • Authorization: Defines what your app is allowed to do.
    • Usage Tracking & Billing: Allows the service provider to monitor your usage and bill you accordingly.
    • Rate Limiting: Prevents abuse by limiting the number of requests you can make in a given time period.
  • Security is CRITICAL:

    • NEVER hardcode an API key directly in your source code (e.g., val apiKey = "AIza...").
    • If you do, it can be easily extracted from your compiled app (APK), and others can use your key, leading to unexpected charges and security risks.
    • Best Practice: Store the key in a secrets.properties file that is excluded from version control (.gitignore) and load it into your app's BuildConfig at compile time.

5. Integrating Google Gemini API in Android

This section details the practical steps using the official Google AI client SDK for Android, which simplifies the process of making REST API calls.

5.1 Prerequisites

  1. Get an API Key: Go to Google AI Studio (aistudio.google.com), sign in, and click "Get API key" -> "Create API key in new project". Copy this key.
  2. Android Studio Project: Create a new project or use an existing one. Ensure it uses Kotlin.

5.2 Step 1: Add Dependencies

In your app-level build.gradle.kts (or build.gradle) file, add the dependency for the Gemini SDK:

KOTLIN
// build.gradle.kts (Module :app)

dependencies {
    // ... other dependencies
    implementation("com.google.ai.client.generativeai:generativeai:0.3.0") // Use the latest version
    implementation("androidx.lifecycle:lifecycle-viewmodel-ktx:2.7.0") // For using coroutines in ViewModel
}

5.3 Step 2: Store the API Key Securely

  1. In your project's root directory, create a new file named secrets.properties.
  2. Add your API key to this file:
    PROPERTIES
        GEMINI_API_KEY=YOUR_API_KEY_HERE
        
  3. Add secrets.properties to your root .gitignore file to prevent it from being checked into version control.
    TEXT
        # .gitignore
        ...
        secrets.properties
        
  4. Configure your app-level build.gradle.kts to load this key into BuildConfig.

    KOTLIN
        // build.gradle.kts (Module :app)
        
        import java.util.Properties
        
        // Read the secrets.properties file
        val properties = Properties()
        val secretsFile = project.rootProject.file("secrets.properties")
        if (secretsFile.exists()) {
            properties.load(secretsFile.inputStream())
        }
    
        android {
            // ...
            buildFeatures {
                buildConfig = true
            }
        }
        
        defaultConfig {
            // ...
            // Make the API key available in BuildConfig
            buildConfigField("String", "GEMINI_API_KEY", "\"${properties.getProperty("GEMINI_API_KEY")}\"")
        }
        

    After syncing Gradle, you can access the key in your code via BuildConfig.GEMINI_API_KEY.

5.4 Step 3: Initialize the Model and Make API Call

Network operations must be performed on a background thread. We use Kotlin Coroutines within a ViewModel for this.

KOTLIN
// In a ViewModel, for example, GenerativeViewModel.kt

import androidx.lifecycle.ViewModel
import androidx.lifecycle.viewModelScope
import com.google.ai.client.generativeai.GenerativeModel
import kotlinx.coroutines.flow.MutableStateFlow
import kotlinx.coroutines.flow.StateFlow
import kotlinx.coroutines.flow.asStateFlow
import kotlinx.coroutines.launch

class GenerativeViewModel : ViewModel() {

    // Use a StateFlow to hold and observe the UI state
    private val _uiState = MutableStateFlow<UiState>(UiState.Initial)
    val uiState: StateFlow<UiState> = _uiState.asStateFlow()

    // Initialize the GenerativeModel
    private val generativeModel = GenerativeModel(
        modelName = "gemini-pro", // For text-only prompts
        apiKey = BuildConfig.GEMINI_API_KEY
    )

    fun generateContent(prompt: String) {
        // Set the state to loading
        _uiState.value = UiState.Loading

        viewModelScope.launch {
            try {
                // The API call is a suspending function
                val response = generativeModel.generateContent(prompt)
                
                // Update state with the successful response
                _uiState.value = UiState.Success(response.text ?: "No response text.")

            } catch (e: Exception) {
                // Update state with the error
                _uiState.value = UiState.Error(e.localizedMessage ?: "An unknown error occurred")
            }
        }
    }
}

// A sealed interface to represent the different UI states
sealed interface UiState {
    object Initial : UiState
    object Loading : UiState
    data class Success(val output: String) : UiState
    data class Error(val message: String) : UiState
}


6. Generating AI Text Responses

Building on the integration, here's how to handle different types of text generation.

6.1 Simple Text-Only Prompts

The code in section 5.4 demonstrates a simple text-only prompt. You pass a single string to generateContent() and access the result from response.text.

6.2 Streaming Responses

For long responses, streaming provides a much better user experience by displaying the text as it is generated, rather than waiting for the entire response to complete.

  • Use the generateContentStream() method, which returns a Flow.
  • Collect the flow to receive chunks of the response as they become available.

KOTLIN
// Inside the ViewModel

fun generateContentStreaming(prompt: String) {
    _uiState.value = UiState.Loading

    viewModelScope.launch {
        try {
            var fullResponse = ""
            // Use the streaming variant
            generativeModel.generateContentStream(prompt)
                .collect { chunk ->
                    fullResponse += chunk.text
                    // Update the UI state with the incrementally built response
                    _uiState.value = UiState.Success(fullResponse)
                }
        } catch (e: Exception) {
            _uiState.value = UiState.Error(e.localizedMessage ?: "An unknown error occurred")
        }
    }
}

6.3 Chat (Conversational History)

To build a chatbot, the model needs to remember the context of the conversation. The SDK manages this for you with a Chat object.

  1. Start a chat session: Use generativeModel.startChat(). You can provide it with past messages if you want to resume a conversation.
  2. Send messages: Use chat.sendMessage() for each new user prompt. The chat object automatically includes the history in subsequent API calls.

KOTLIN
// Inside the ViewModel

// Keep a reference to the chat object
private val chat = generativeModel.startChat(
    history = listOf(
        // Optional: prime the model with a persona
        // content(role = "user", text = "Hello, I have some questions about Android development."),
        // content(role = "model", text = "Great, I can help with that! What's your question?")
    )
)

fun sendChatMessage(prompt: String) {
    _uiState.value = UiState.Loading

    viewModelScope.launch {
        try {
            val response = chat.sendMessage(prompt) // Use the chat object
            _uiState.value = UiState.Success(response.text ?: "No response text.")
        } catch (e: Exception) {
            _uiState.value = UiState.Error(e.localizedMessage ?: "An unknown error occurred")
        }
    }
}


7. Handling User Input and AI Output

This involves connecting the ViewModel to your UI (e.g., Jetpack Compose).

  • UI Components:
    • Input: TextField for the user to type their prompt.
    • Action: Button to call the ViewModel function.
    • Output: Text to display the AI's response.
    • Loading Indicator: CircularProgressIndicator to show when a request is in progress.
  • State Management: The UI observes the StateFlow from the ViewModel and recomposes whenever the state changes.

Example with Jetpack Compose:

KOTLIN
// In your Composable screen

@Composable
fun GenerativeAiScreen(
    viewModel: GenerativeViewModel = viewModel()
) {
    // Collect the UI state as a Compose State
    val uiState by viewModel.uiState.collectAsState()
    var prompt by remember { mutableStateOf("") }

    Column(modifier = Modifier.padding(16.dp)) {
        TextField(
            value = prompt,
            onValueChange = { prompt = it },
            label = { Text("Enter your prompt") },
            modifier = Modifier.fillMaxWidth()
        )
        Spacer(modifier = Modifier.height(8.dp))
        Button(onClick = {
            if (prompt.isNotBlank()) {
                viewModel.generateContent(prompt)
            }
        }) {
            Text("Generate")
        }
        Spacer(modifier = Modifier.height(16.dp))
        
        // React to the different UI states
        when (val state = uiState) {
            is UiState.Initial -> {
                // Show nothing or a welcome message
            }
            is UiState.Loading -> {
                CircularProgressIndicator()
            }
            is UiState.Success -> {
                Text(text = state.output)
            }
            is UiState.Error -> {
                Text(text = state.message, color = Color.Red)
            }
        }
    }
}


8. Basic Error Handling and API Usage Limits

8.1 Error Handling

Robust error handling is crucial for a good user experience.

  • Network Errors: The device might be offline. A try-catch block can catch exceptions related to connectivity. You should check for network connectivity before making a call.
  • API Errors:
    • Invalid API Key: The server will reject the request, often with a 400 or 401 HTTP status code. The SDK will throw an exception.
    • Content Safety: The model may refuse to answer a prompt or generate a response that violates safety policies. This can throw a ResponseStoppedException. You should check the response.promptFeedback for details.
  • Implementation: The try-catch block shown in the ViewModel examples is the fundamental pattern for handling these errors. You catch the Exception, update the UiState to an error state, and display a user-friendly message.

KOTLIN
// Example of checking for a specific safety-related exception

import com.google.ai.client.generativeai.type.ResponseStoppedException

// ... inside a catch block
catch (e: ResponseStoppedException) {
    _uiState.value = UiState.Error("Response stopped due to safety settings. Please modify your prompt.")
} catch (e: Exception) {
    _uiState.value = UiState.Error(e.localizedMessage ?: "An error occurred")
}

8.2 API Usage Limits

API services are not unlimited. They impose limits to ensure fair usage and prevent system overload.

  • Rate Limits: The number of requests you can make in a specific time window. For the free tier of the Gemini API, this is typically 60 queries per minute (QPM).
    • Handling: If you exceed this, the API will return an error (e.g., HTTP 429 "Too Many Requests"). The best practice is to implement an exponential backoff strategy: wait for a short period (e.g., 2 seconds), retry, and if it fails again, double the waiting time before the next retry, up to a maximum number of retries.
  • Quotas: Broader limits, such as the total number of requests allowed per day or per month.
  • Monitoring: You can monitor your API usage and see detailed reports in the Google Cloud Console associated with your API project. This is essential for managing costs and staying within limits.