Integrating Gemini Nano On-Device with Android's Google AI Edge SDK

Mar 29, 2026

On-device generative AI has evolved from a futuristic concept to a practical necessity. By executing models locally, developers can provide zero-latency responses, eliminate cloud server compute costs, and guarantee strict user data privacy.

On Android, Google’s Gemini Nano is the foundation for on-device AI. Using the Google AI Edge SDK (formerly part of AICore), developers can run inference directly on the system’s NPU.

In this guide, we will implement an on-device text summarization engine using Kotlin and Jetpack Compose.

Why Gemini Nano?

Gemini Nano is Google’s highly efficient model designed for on-device tasks.

Privacy First: Sensitive text data never leaves the user’s phone.
Offline Capability: Works without internet connectivity (e.g., in subways or airplane mode).
Cost Efficient: Zero API server costs, regardless of user scale.

Implementing On-Device Inference

First, add the required dependencies to your app’s build.gradle.kts:

dependencies {
    implementation("com.google.android.aicore:google-ai-edge-sdk:1.2.0")
}

Now, initialize the model session and run a text summarization task asynchronously:

import android.content.Context
import com.google.android.edge.ai.GenerativeModel
import kotlinx.coroutines.flow.Flow

class OnDeviceSummarizer(context: Context) {
    // Access Gemini Nano through the AI Edge SDK
    private val model = GenerativeModel(
        modelName = "gemini-nano-text",
        context = context
    )

    suspend fun summarize(inputText: String): String {
        val prompt = """
            Summarize the following text professionally. Keep it under 3 bullet points:
            
            $inputText
        """.trimIndent()
        
        val response = model.generateContent(prompt)
        return response.text ?: "Summarization failed."
    }

    // Streaming responses for long texts
    fun summarizeStream(inputText: String): Flow<String> {
        val prompt = "Summarize: $inputText"
        return model.generateContentStream(prompt).map { it.text ?: "" }
    }
}

Integrating into Jetpack Compose

To provide a smooth user experience, handle the inference states inside a Compose screen:

@Composable
fun SummaryScreen(summarizer: OnDeviceSummarizer) {
    var textInput by remember { mutableStateOf("") }
    var summaryResult by remember { mutableStateOf("") }
    var isProcessing by remember { mutableStateOf(false) }
    val scope = rememberCoroutineScope()

    Column(modifier = Modifier.padding(16.dp)) {
        OutlinedTextField(
            value = textInput,
            onValueChange = { textInput = it },
            label = { Text("Enter text to summarize") }
        )
        
        Spacer(modifier = Modifier.height(16.dp))
        
        Button(
            onClick = {
                isProcessing = true
                scope.launch {
                    summaryResult = summarizer.summarize(textInput)
                    isProcessing = false
                }
            },
            enabled = !isProcessing && textInput.isNotEmpty()
        ) {
            Text(if (isProcessing) "Summarizing on-device..." else "Summarize")
        }

        Spacer(modifier = Modifier.height(24.dp))
        Text(text = "Summary:", fontWeight = FontWeight.Bold)
        Text(text = summaryResult)
    }
}

Important Considerations for Production

Model Download Handling: Gemini Nano is distributed via Google Play Services. Before running queries, your app should check if the model is fully downloaded and query download if not.
Thermal Throttling: Heavy continuous on-device inference runs the risk of heating up the mobile processor. Limit heavy usage to prevent OS throttling.