AppsScriptPulse

Beyond external libraries: Google Workspace document and data processing with Gemini Code Execution

Why use a dedicated app when you can simply ask Gemini to write and run the Python code for you? A look at the power of Google Apps Script and GenAI

For many Google Workspace developers, handling complex file formats or performing advanced data analysis has traditionally meant navigating the limitations of Apps Script’s built-in services. We have previously featured solutions for these challenges on Pulse, such as merging PDFs and converting pages to images or using the PDFApp library to “cook” documents. While effective, these methods often rely on loading external JavaScript libraries like pdf-lib, which can be complex to manage and subject to the script’s memory and execution limits.

While users of Gemini for Google Workspace may already be familiar with its ability to summarise documents or analyse data in the side panel, those features are actually powered by the same “Code Execution” technology under the hood. The real opportunity for developers lies in using this same engine within Apps Script to build custom, programmatic workflows that go far beyond standard chat interactions.

A recent project by Stéphane Giron highlights this path. By leveraging the Code Execution capability of the Gemini API, it is possible to offload intricate document and data manipulation to a secure Python sandbox, returning the results directly to Google Drive.

Moving beyond static logic

The traditional approach to automation involves writing specific code for every anticipated action. The shift here is that Gemini Code Execution does not rely on a pre-defined set of functions. Instead, when provided with a file and a natural language instruction, the model generates the necessary Python logic on the fly. Because the execution environment includes robust libraries for binary file handling and data analysis, the model can perform varied tasks without the developer needing to hardcode each individual routine. Notably, the model can learn iteratively which means if the generated code fails, it can refine and retry the script up to five times until it reaches a successful output.

While basic data analysis is now a standard part of Gemini for Workspace, having direct access to the library list in the Gemini sandbox opens up additional specialised, developer-focused avenues:

  • Dynamic Document Generation: Using python-docx and python-pptx, you can programmatically generate high-fidelity Office documents or presentations based on data from Google Workspace, bridging the gap between ecosystems without manual copy-pasting. [Here is a container bound script based on Stéphane code for Google Docs that generates a summary PowerPoint file]
  • Programmatic Image Inspection: Using Gemini 3 Flash, you can build tools that inspect images at a granular level. For example, a script could process a batch of site inspection photos, using Python code to “zoom and inspect” specific equipment labels or gauges, and then log those values directly into a database.

The mechanics and constraints

The bridge between Google Drive and this dynamic execution environment follows a straightforward pattern:

  1. File Preparation: The script retrieves the target file from Drive and converts the blob into a format compatible with the Gemini API.
  2. Instruction & Execution: Along with the file, a prompt is sent describing the desired outcome.
  3. The Sandbox: Gemini writes and runs the Python code required to fulfil the request.
  4. Completion: Apps Script receives the modified file or data and saves it back to the user’s Drive.

However, there are important boundaries to consider. The execution environment is strictly Python-based and has a maximum runtime of 30 seconds. Furthermore, developers should be mindful of the billing model: you are billed for “intermediate tokens,” which include the generated code and the results of the execution, before the final summary is produced.

Get started

For those interested in the implementation details, Stéphane has shared a repository containing the Apps Script logic and the specific configuration needed to enable the code_execution tool.

Source: Stéphane Giron on Medium | Code: GitHub Repository

Leave a Reply

Your email address will not be published. Required fields are marked *