AppsScriptPulse

Automating Data Extraction with the Gemini API Controlled Generation and AppSheet’s New Gmail Integration

Tired of manually processing invoices? I recently built a demo that automates this tedious task using the power of Gemini, AppSheet’s new Gmail integration, and a custom Apps Script library. Let me show you how it works!

Here’s the setup:

  • Gmail Integration: AppSheet’s new Gmail integration allows you to trigger automations based on incoming emails. I’ve set it up to monitor a specific Gmail label called “Invoices”.
  • Apps Script Library: Using my “GeminiApp” (available on Github) which simplifies the interaction with Google’s Gemini AI models directly from Apps Script. This library handles the heavy lifting of making API calls and managing responses.
  • Controlled Generation: Gemini’s “Controlled Generation” feature lets me define a JSON schema that structures the AI’s output. This is key for extracting invoice data in a consistent, machine-readable format.

The Workflow:

  1. Invoice Received: When an invoice email arrives and is labelled “Invoices”, AppSheet’s Gmail integration kicks in.
  2. Automation Triggered: AppSheet triggers an automation that calls a custom Apps Script function called jsonControlledGeneration.
  3. Data Extraction: The jsonControlledGeneration function uses the GeminiApp library to send the email body to Gemini with a predefined JSON schema for invoice data.
  4. Structured Output: Gemini processes the email content and extracts the relevant invoice details (e.g., invoice number, supplier name, date, amount) in a JSON format that adheres to the schema.
  5. Downstream Processing: The structured JSON output can then be easily returned to the AppSheet automation for further actions, such as automatically populating your data table, updating a database, or triggering a payment process.

If you would like to try this yourself, instructions are provided below.

Want to try it yourself?

To set this demo up you will either need a Google AI Studio API key or a Google Cloud project with Vertex AI enabled. Information on both these setups is included in the GeminiApp Setup Instructions. Once you have these you can follow these steps:

  1. Open the GeminiApp Controlled Generation Google Apps Script project and click ‘Make a copy’ from the Overview page
  2. Open the Invoice Tracker template and click ‘Copy and Customize’, then click OK on the Error Creating App: Apps Script function is invalid error
  3. Navigate to appsheet.com and from your ‘recent’ apps  open the ‘Invoice Tracker’ app
  4. Open Automations and for the ‘New Invoices’ event under the Gmail Event Source, click Authorize, then
  5. In the ‘Add a new data source’ enter a name e.g. Invoices Trigger, click the Gmail button and follow the authentication steps
  6. Once complete in the AppSheet Automation settings select your Gmail account and a Label to watch
  7. In the Process section click on the GeminiApp task and click on the blue file icon, then select your copied version of the Apps Script project and click Authorize
  8. Once Authorize, from the Function Name select jsonControlledGeneration

To test this app, you can copy and send this example invoice.

Step 7: Click on the blue file icon, then select your copied version of the Apps Script project

The Power of Controlled Generation

Controlled Generation is a powerful way for extracting information from unstructured data like emails. By defining a JSON schema, I can specify exactly what information I want Gemini to extract and how it should be structured. This ensures that the output is consistent and reliable, eliminating the need for manual cleanup or post-processing.

Here’s an example of a simple JSON schema for invoice data:

 const schema = {
    "type": "object",
    "properties": {
      "Invoice Reference": {
        "type": "string",
        "description": "Unique identifier for the invoice"
      },
      "Supplier Name": {
        "type": "string",
        "description": "Name of the supplier"
      },
      "Invoice Date": {
        "type": "string",
        "description": "Date the invoice was issued",
        "format": "date"
      },
      "Due Date": {
        "type": "string",
        "description": "Date the invoice is due",
        "format": "date"
      },
      "Invoice Amount": {
        "type": "number",
        "description": "Total amount due on the invoice"
      },
      "Notes": {
        "type": "string",
        "description": "Additional notes related to the invoice",
        "nullable": true
      }
    },
    "required": ["Invoice Reference"]
  }

Creating JSON Schemas with Gemini

Creating JSON schemas can seem a bit daunting at first, but Gemini can actually help you with that too! If you have sample data in a Google Sheet, you can use the Gemini Side Panel to generate a schema automatically. Just highlight the data and ask Gemini to create a JSON schema for you. You can even provide a description for each property to make your schema more understandable. Below is a prompt you can use in the Gemini Sheet Side Panel to generate a schema for your own data:

I'm using Controlled Generation with the Gemini API as described in the document https://drive.google.com/file/d/1ETKHlEUDQzJ-f2fmAzsuDjcwdt1D7R2y/view?usp=drive_link

I need help creating a JSON schema to capture data from a screen.

Could you generate a JSON schema suitable for using Controlled Generation with the Gemini API? I want to extract specific information from what's displayed on my screen.

Here are my requirements:

* **Comprehensive Schema:** The schema should be designed to capture a variety of relevant data points from the screen.  
* **Detailed Descriptions:** Please include a clear and concise \`description\` for each \`property\` in the schema. This will help me understand the purpose of each field.  
* **Format Specification:** If any columns contain date or datetime data, please use the \`format\` field to specify the appropriate format (e.g., "date", "date-time"). This is crucial for accurate data parsing.  
* **Output Example:** Please provide the schema in the following format:

```
const schema = {
  description: "Description of the data",
  type: "array", // or "object", depending on the structure
  items: { // If type is array
    type: "object",
    properties: {
      propertyName: {
        type: "string", // or other appropriate type
        description: "Description of the property",
        format: "date", // or "date-time", if applicable
        nullable: false, // or true
      },
      // ... more properties
    },
    required: ["propertyName"], // If any properties are required
  },
  properties: { // If type is object
      propertyName: {
        type: "string", // or other appropriate type
        description: "Description of the property",
        format: "date", // or "date-time", if applicable
        nullable: false, // or true
      },
      // ... more properties
    },
    required: ["propertyName"], // If any properties are required
};
```

Limitations and future developments

While the beta Gmail integration in AppSheet marks a significant new feature, it’s important to note a current limitation is support for processing email attachments. Currently, the integration focuses on metadata such as sender name, subject, and message body, but the AppSheet team have acknowledged attachment support will be added in the near future.

Looking ahead, at Google Cloud Next 2024 the AppSheet team announced an upcoming “Gemini Extract” feature, currently in private preview. This feature intends to include a native Gemini ‘controlled generation’ feature which would let app creators select the data fields they would like populated from sources including images and text. This should be a more intuitive approach to data extraction, directly integrating Gemini capabilities into AppSheet. The Next video includes a Google URL to sign up to the Gemini Extract feature https://goo.gle/appsheet-previews?r=qr

Summary

The Invoice Tracker example hopefully highlights the opportunity for streamlined solutions with data extraction with AppSheet’s Gmail integration, Gemini, and Apps Script.The GeminiApp library also  simplifies the integration of Google’s Gemini AI models into Google Workspace, providing developers with tools to create sophisticated AI-powered applications.

Using the structured JSON output with Controlled Generation can help AppSheet creators by making it easier to ensure you get the data back in a suitable format including the type of data you need, such as dates. With the GeminiApp library, rapid prototyping is achievable, in the ‘Invoice Tracker’ example I was able to get a functional prototype up and running in under 30 minutes.

AppSheet’s Gmail integration, generally available to all AppSheet and Google Workspace Core users, can trigger automations directly from incoming emails without requiring app deployment. Combined with Apps Script functions this opens the door to some powerful opportunities for AppSheet creators. Integrating Gemini-powered AI extraction with AppSheet and Apps Script provides an innovative solution for automating data extraction from emails. By taking advantage of these capabilities, citizen developers can create efficient and user-friendly solutions.

GeminiApp Gets a Major Upgrade: Seamless Transition, Enhanced Functionality, and More!

The GeminiApp library, designed to bring Google’s Gemini AI models into your Google Apps Script projects, has just received a major update. This new release, version 2025.01, introduces significant enhancements that expand the library’s capabilities for building sophisticated AI-powered applications within Google Workspace.

Here’s a quick look at what’s new:

  • Seamless Transition: The updated library allows for a smooth transition from Google AI Studio to Vertex AI.
  • JSON-Controlled Generation: Generate content in JSON format, either by providing a schema or allowing the model to infer it from your prompt.
  • Code Execution: Generate and execute code directly within prompts using gemini-2.0 models.
  • System Instructions: Guide the model’s behavior by providing system instructions during initialization.
  • Caching: Improve efficiency and reduce token usage by caching file uploads.
  • Easier Copy/Paste: The library now supports initialization using both new GeminiApp() and new GoogleGenerativeAI(), making it easier to copy code from Google AI Studio.

The update includes over 1.5K new lines of code, primarily in src/GeminiApp.jssrc/GoogleAICacheManager.js, and tests/tests.js. Existing projects can be updated by replacing the existing GeminiApp.gs code with the updated library. This update provides a broader set of tools to create more personalized and efficient workflows.

For those new to the GeminiApp library, multiple setup options are available. Check out the updated examples and test cases in the README.md file to get started.

This update marks a step forward for developers integrating Gemini into Google Workspace. With new features and the ability to move between Google AI Studio and Vertex AI, the possibilities are vast.

Feel free to share your use cases, code improvements, and feature requests!

Source: GeminiApp Gets a Major Upgrade: Seamless Transition Between Google AI Studio and Vertex AI, Enhanced Functionality, Multimodal Input, and More!

Automate Google Drive test data creation with Google Apps Script

This post describes a Google Apps Script function to create a dummy folder structure in Google Drive, including nested folders and files (Documents, Sheets, Slides). It highlights some useful features beginners might find useful such as recursion, randomness, and the use of Google Apps Script services for Drive interactions. The post additionally discusses specific challenges encountered, like using Drive.Files.create for file creation and ensuring compatibility with Shared Drives.

Working with Google Workspace Enterprise customers we recommend using a DEV/Test Google Workspace domain. If you purchase your Workspace licences through a Google Partner you should be able to get a free test domain, which admins can use to test features like Data Loss Prevention, Context Aware-Access without fear of breaking things for your live domain.

Unfortunately there are no convenient settings or data duplication so configuring your test environment can be a challenge. To help create some dummy data with a little help from Gemini I was able to create the following  createDummyFoldersWithNesting() function:


To use the script

  1. Copy into the Apps Script Editor
  2. Enable the Google Drive Advanced Service
  3. Configure the settings inside the function for the root folder and the number of files/folders to generate

Gemini suggested I highlighted the following key points:

  • Recursion: The createNestedFolders function calls itself to create folders within folders.
  • Randomness: The script randomly determines the number of folders and files to create, and the types of files.
  • Google Apps Script Services: It uses DriveApp and Drive.Files to interact with Google Drive.
  • File Metadata: When creating files, it uses the mimeType property to specify the Google Apps Script file type.

Whilst Gemini wrote a lot of the code for me (with a little guidance), there were a couple of gotchas I’ll highlight.

Drive.Files.create instead of DriveApp.createFile(name, content, mimeType) – the current documentation would suggest that you can use the .createFile() method and include a MimeType like GOOGLE_SHEETS, but as explained in this Stackoverflow post Google have said “After extensive consideration, we have determined that DriveApp.createFile() should not be used to create MimeType.GOOGLE_* files.” 

I could have asked Gemini to rewrite this to use DocumentAppSpreadsheetApp or SlidesApp .create() methods e.g. SpreadsheetApp.create() but then I would have to move into a folder, use extra scopes, which all felt a bit messy so instead opted for Drive.Files.create.

Drive.Files.create supporting Shared Drives without a blob – when using Advanced Services there is a bit of cross referencing required between the auto-complete in the script editor and the associated API documentation. For my script I wanted to support creating files in Shared Drive. To do this requires adding the supportsAllDrives as optionalArgs in the Drive.Files.create(resource, mediaData, optionalArgs) method. As I only wanted blank Docs, Sheets and Slides I was scratching my head as to what to include for the mediaData blob. Fortunately this issue was discussed in the Google Apps Script Community – Google Group and it was clear I could use null or undefined.

Source: Automate Google Drive test data creation with Google Apps Script

The democratisation of app development with Duet AI for AppSheet

From one simple prompt with Duet AI for AppSheet you are able to create a well structured application.

I recently had the pleasure of the opportunity to speak at DevFest Scotland. My topic was how you can use code with Google’s ‘no-code’ solution AppSheet. You can see some of the ways this is achievable in my shared slides. Whilst there are coding opportunities with AppSheet and having some coding/data schema knowledge is useful, I would argue that this is increasingly becoming less important.

In this post on the CTS Medium I share how Duet AI for AppSheet has hugely reduced the entry point for app creation. As part of this I share how a simple request to the Duet AI for AppSheet assistant of ‘an easy way for volunteers to borrow a laptop’ becomes the starting point for a well structured application. Having introduced AppSheet to a number of customers I can’t emphasise enough how big a step this is in the democratisation of app creation. Exciting times!

[Thanks to Christian Schalk and the AppSheet team for early access to Duet AI for AppSheet and providing input on the source post].

Source: The democratisation of app development with Duet AI for AppSheet

Two ways to remove duplicate rows from Google Sheets using Google Apps Script

For many years users had to find a variety of workarounds if they wanted to remove duplicate rows from Google Sheets. This all changed in 2019 when Google announced new features, which included removing duplicates from Google Sheets. Recently I got tagged in a conversation with Andrea Guerri who shared some ‘remove duplicate’ example scripts. This sent me down a bit of a rabbit hole looking at various ‘remove duplicate’ Apps Script solutions and I’ll share two of my favourites.

Source: Two ways to remove duplicate rows from Google Sheet using Google Apps Script

Google Apps Script Patterns: Keeping a gam generated users report up-to-date with Google Apps Script

In Google Workspace gam is probably the ‘go to’ tool command line tool which allows administrators to easily manage domain and user settings. Recently I was asked about how you can keep gam generated reports up-to-date using Google Apps Script. It’s worth remember that gam uses public Google Workspace APIs when it performs actions and reports. In this post I show you a pattern for building script to keep gam reports fresh.

Source: Google Apps Script Patterns: Keeping a gam generated users report up-to-date with Google Apps Script

Creating a Google Drive report in Google Sheets: Making Google Workspace Enterprise solutions with Google Apps Script

Image credit: Martin Hawksey (with the help of DALL·E 2)

Think 10x — supercharging your Google Apps Script solutions by directly calling Google Workspace Enterprise APIs.

The Google Apps Script built-in services like SpreadsheetApp, Maps and GmailApp are a great onramp for users with limited coding experience, the flip side is you can find yourself easily getting results but not in the most efficient way. DriveApp is a great example where Google have made it easy to iterate across folders and files, but when you have lots of folders and files it becomes a time consuming process and you hit execution limits.

An alternative approach is using Google Apps Script to make direct calls to the Drive API. The benefits of this approach is you can be more specific in the data you want back and it gives more flexibility with how you call the API, in some cases with the ability to make batch or asynchronous processes.

In this post I highlight a method ideal for scenarios when you want to index larger volumes of My Drive files and folders to a Google Sheet with calls directly to the Drive API. The post includes some sample code you can use which instead is able to reduce a 4 minute runtime to index 10,000 files and folders to one that can complete in under 40 seconds!

Source: Creating a Google Drive report in Google Sheets: Making Google Workspace Enterprise solutions with Google Apps Script

AppSheet/PayPal integrated payment solutions and tips on handling data in Google Sheets

In this post I’ll provide an overview of how PayPal was integrated into AppSheet with the help of Google Apps Script. Even if you are not interested in payment integrations this post also hopefully pulls together useful tips, best practices and code patterns for reading/writing data to Google Sheets.

Hopefully this post illustrates not only just a method for integrating a PayPal payment processor into an AppSheet app, but also a method which can generally be used to extend AppSheet functionality with Google Apps Script powered Web Apps.

As a bonus you also get some of my top tips for interacting with data in Google Sheets including efficiently reading/writing data for multiple users without concurrent overwrites.

Source: AppSheet/PayPal integrated payment solutions and tips on handling data in Google Sheets

Working with the Google Drive API Revisions history: Tips for handling revision merges with Google Apps Script

Having spent quite a bit of time working with the Google Drive API Revisions resource in this post I thought it would be useful to share some of the lessons and solutions I’ve picked up along the way. For this I’ll be sharing code snippets for interacting with the Revisions resource with Google Apps Script, but the solutions discussed could easily be applied to your programming language of choice.

Source: Working with the Google Drive API Revisions history: Tips for handling revision merges

Automatically running a Google Apps Script function every quarter or another monthly period greater than one month 

One of the great things about Google Apps Script is the way you can automate tasks. I’ve previously written about how I automate reporting and other examples like running backup processes. These usually run daily or once a month which is very straight forward to setup in Google Apps Script. If you want to run a script automatically every x number of months such as quarterly it gets a little harder. If you only want you script to run every three months … another option is to manage triggers programmatically which allows you to specify the date a function should be run again.

I’ve recently been revisiting some of my old Google Apps Script posts to do a bit of housekeeping. I thought this was a nice little snippet should you want to schedule a function to run on a time-driven trigger greater than one month. The trick used is to recursively create a time-driven trigger when the function is called. The solution comes with some caveats :)

Source: Running a Google Apps Script function every quarter or x months