How to Calculate Cosine Similarity of Vector Embeddings in Google Sheets (w/ Free Sheet Template & Apps Script).

James Brockbank James Brockbank | Last updated: June 13, 2025

This guide was created to sit alongside my talk at SMX Advanced 2025: Measuring Link Relevance Using Vector Embeddings and Cosine Similarity to show you how to calculate cosine similarity of vector embeddings in Google Sheets.


At SMX Advanced 2025, I spoke about how to measure link relevance using vector embeddings and cosine similarity, in a bid to encourage our industry to move away from increasingly meaningless link metrics like link count and towards relevance-based analysis.

Lots of guidance exists on how to generate vector embeddings for web pages using Screaming Frog and OpenAI, but there’s little out there on the next step; calculating cosine similarity between these, or how to generate embeddings for text (queries, let’s say) rather than URLs.

That’s what this guide aims to do.

I’ll show you exactly how you can use Google Sheets to:

  • Generate vector embeddings for keywords and queries (and roll these up into ‘keyword cluster’ embeddings)
  • Import and average embeddings for your content or for your link profile (from Screaming Frog, generated with OpenAI’s API)
  • Build ‘topic cluster’ or ‘site’ embeddings from related queries and groups of pages
  • Calculate cosine similarity between vectors to measure how relevant your content is to a query, how aligned a link is to a page, or how well a page fits a topic.

Let’s move beyond subjectively measuring relevance, and start measuring it the way Google does.

Understanding Vector Embeddings, Cosine Similarity & Their Use Cases in SEO

If you attended my talk at SMX Advanced, you’ll know exactly why we need to move away from outdated link metrics towards a measurement framework that’s rooted in relevance, but if you didn’t (or are looking to run cosine similarity calculations in Sheets for other SEO purposes), here’s an overview.

To understand why this setup matters, we first need to break down what vector embeddings and cosine similarity actually are, and why they’re becoming essential tools for modern SEO analysis.

What Are Vector Embeddings?

Vector embeddings are high-dimensional numerical representations of content, links, or search queries. Instead of working with raw strings of text, embeddings capture the semantic meaning of content; allowing us to compare intent, context, and relevance far more effectively.

In SEO, we can generate embeddings for things like:

  • Queries (e.g. “best running shoes for beginners”)
  • Web pages (by extracting content and converting it into an embedding)
  • A link profile (or individual linking pages) (by embedding the linking page)

Each embedding is a vector; a long list of floating-point numbers — like [0.031, -0.012, 0.157, ...] — and those numbers encode meaning based on the context the model was trained on.

What Is Cosine Similarity?

Cosine similarity is a simple way to measure how similar two sets of vectors are.

The result ranges from:

  • 1.0 → Perfect semantic match
  • 0.0 → No similarity
  • -1.0 → Perfect semantic opposition (but rarely seen in SEO)

Why This Matters in SEO (and specifically, link analysis)

For years, SEOs have relied on metrics like DR, link count, and follow vs nofollow to evaluate links. But these metrics tell you almost nothing about how relevant a link is to the topic you’re trying to rank for.

And this, in 2025, is an issue.

Google has made it clear that relevance matters more than raw authority:

“We try to understand what is relevant for a website.”
John Mueller, Google Search Office Hours

But it’s traditionally been difficult to actually measure the relevance of links in comparison to our sites and/or the queries searches are searching.

SEO Use Cases for Embeddings + Cosine Similarity

Using vector embeddings and cosine similarity, we can answer powerful questions with data:

  • How relevant are the links we’ve earned to our site to the topics we’re trying to rank for?
  • Which of our competitors’ links are most relevant to their site?
  • To increase our rankings for [keyword], do we need to focus on content, links or both?

With the right setup, all of this can be done in Google Sheets, using embeddings you generate or extract and cosine similarity to measure alignment.

That’s exactly what I’m going to show you how to do in this guide.

Why Use Google Sheets to Calculate Cosine Similarity?

You don’t need Python, a local Jupyter environment, or a full-scale data pipeline to calculate cosine similarity. You can build a surprisingly powerful way to generate and compare embeddings in Google Sheets (paired with Screaming Frog and OpenAI).

Here’s why Sheets is the perfect tool for this kind of analysis:

It’s Fast to Get Set Up & Easy to Share

Google Sheets lets you get this up and running in minutes, even if you don’t know how to use Python.

You can:

  • Import embeddings from Screaming Frog exports
  • Generate query embeddings with a custom formula
  • Run comparisons across hundreds of rows
  • Share the workbook with your team, clients, or stakeholders instantly

You Can Use Apps Script to Add Custom Logic

This guide uses a small Google Apps Script to:

  • Generate OpenAI embeddings inside your sheet
  • Average multiple embeddings into cluster centroids
  • Compute cosine similarity between vectors

Once the script is added, you can use these functions like native formulas:

=generateEmbedding("best trail shoes")
=cosineSimilarityFromJson(B2, C2)
=averageJsonEmbeddings(D2:D100)

You’re not locked into complex processes; this becomes a reusable, scalable workflow, all run in an environment you’re likely familiar with.

It Makes Semantic SEO More Accessible

Embedding-based SEO can feel intimidating. Sheets strips away that complexity:

  • You can see what’s happening at each step
  • You can inspect individual rows and tweak inputs
  • You don’t need to write or debug Python just to test a theory

It’s a perfect balance of flexibility, scale, and simplicity; making semantic SEO workflows more collaborative and less siloed in technical teams.

Step-by-Step: How to Calculate Cosine Similarity in Google Sheets

Now that you understand what vector embeddings are — and why cosine similarity is such a powerful way to measure relevance — let’s walk through exactly how to set this up in Google Sheets.

We’ll break this down into a series of practical steps, showing you how to:

  1. Set up your Google Sheet to work with embeddings
  2. Add a custom Apps Script to generate and compare vectors
  3. Create separate sheets to manage:
    • Page embeddings (e.g. from Screaming Frog)
    • Query and topic cluster embeddings (generated via OpenAI)
    • Cosine similarity analysis (comparing pages, queries, links, and clusters)

By the end, you’ll have a working Google Sheet that lets you score the relevance of backlinks to a query, topic, page or site

Let’s start by setting up the Apps Script that powers everything.

Setting Up Google Apps Script (Required Step)

Before you can use any of the functions in this guide, you’ll need to create a custom Google Apps Script. This will allow Google Sheets to:

  • Generate query embeddings using the OpenAI API (of text, not URLs … these need to come from Screaming Frog)
  • Parse and average JSON-formatted vectors
  • Calculate cosine similarity between embeddings

Follow these steps to get started:

Step 1: Open the Script Editor

  1. In your Google Sheet, click Extensions → Apps Script
  2. Delete any code that appears by default
  3. Give the Apps Script a name (Embeddings Generation & Cosine Similarity Calculation)

Your blank Apps Script workspace should now look like this:

Step 2: Paste in the Full Script

Copy and paste the full code below into the script editor window.

This includes all functions for embedding generation, vector parsing, averaging, and cosine similarity.

/**
 * Converts a bracketed or comma-separated list of numbers into
 * a JSON array string. For example:
 * "[1,2,3]" or "1,2,3" becomes "[1,2,3]" as text.
 *
 * @param {string} str The cell text containing comma-separated numbers.
 * @return {string} A JSON string, e.g. "[1.23,4.56,7.89]"
 */
function convertStringToFloatArrayAsJson(str) {
  // Remove any [ or ] characters
  str = str.replace(/[\[\]]/g, '');
  
  // Split on commas and convert each substring to a float
  const arr = str.split(',').map(parseFloat);
  
  // Return the array as a JSON string
  return JSON.stringify(arr);
}

/**
 * Calculates the dimension-wise average of multiple JSON embeddings.
 * Each row in 'range' should contain a JSON array string, e.g. "[-0.0482, 0.01846, 0.01273]".
 * 
 * @param {range} range A vertical range with one JSON string per row
 * @return {string} A JSON array of floats representing the average embedding
 */
function averageJsonEmbeddings(range) {
  // Filter out any row that doesn't have a string
  const filteredRows = range.filter(row => typeof row[0] === 'string' && row[0].trim() !== '');

  let arrays = [];
  for (let i = 0; i < filteredRows.length; i++) {
    const jsonStr = filteredRows[i][0];
    try {
      const arr = JSON.parse(jsonStr);
      arrays.push(arr);
    } catch (e) {
      // Return an error message as text so you can see which row is bad
      return `Error in row ${i + 1} with value "${jsonStr}" → ${e}`;
    }
  }

  if (arrays.length === 0) {
    return 'No valid JSON found.';
  }

  const dim = arrays[0].length;
  const sums = new Array(dim).fill(0);

  for (let arr of arrays) {
    if (arr.length !== dim) {
      return `Dimension mismatch. Expected length ${dim}, got ${arr.length}`;
    }
    for (let i = 0; i < dim; i++) {
      sums[i] += arr[i];
    }
  }

  const count = arrays.length;
  const averageArray = sums.map(total => total / count);

  return JSON.stringify(averageArray);
}

/**
 * Computes cosine similarities for a set of "page embeddings" (JSON) against
 * a single "site embedding" (JSON). Returns [[URL, similarity], [URL, similarity], ...].
 *
 * @param {string} siteEmbeddingJson  A cell containing a JSON array, e.g. "[-0.048,0.018,...]"
 * @param {A2:A range} urlsRange      A vertical range of URLs (one URL per row)
 * @param {F2:F range} embeddingsRange A vertical range of corresponding JSON embeddings
 * @return 2D array of [url, similarity]
 */
function computeCosineSimilarities(siteEmbeddingJson, urlsRange, embeddingsRange) {
  // 1) If the user only selects a single cell for 'urlsRange' or 'embeddingsRange',
  //    Sheets may pass a string instead of a 2D array. Normalize them:
  const siteEmbedding = parseJsonSafe(siteEmbeddingJson);
  if (!Array.isArray(urlsRange[0])) {
    // single cell -> wrap in 2D
    urlsRange = [[urlsRange]];
  }
  if (!Array.isArray(embeddingsRange[0])) {
    embeddingsRange = [[embeddingsRange]];
  }

  // 2) Precompute norm of the site embedding
  const siteNorm = vectorNorm(siteEmbedding);

  // 3) We'll build a results array: each row [ URL, similarityScore ]
  const results = [];

  // 4) Loop over each row in 'urlsRange' and 'embeddingsRange'
  //    (They should be the same number of rows.)
  for (let i = 0; i < urlsRange.length; i++) {
    // The URL in this row (column A in "links")
    const url = urlsRange[i][0];

    // The JSON embedding string in column F
    const jsonStr = embeddingsRange[i][0];
    if (typeof jsonStr !== 'string') {
      // If blank or invalid, skip
      continue;
    }

    // Parse the page embedding from JSON
    const pageEmbedding = parseJsonSafe(jsonStr);
    if (!pageEmbedding || pageEmbedding.length !== siteEmbedding.length) {
      // If invalid JSON or dimension mismatch, skip
      continue;
    }

    // 5) Compute the cosine similarity
    const sim = cosineSimilarity(siteEmbedding, pageEmbedding, siteNorm);

    // 6) Push [url, similarity] into results
    results.push([url, sim]);
  }

  // Return as a 2D array. Each row => one row in the sheet.
  return results;
}

/** Safely parse a JSON string into an array, or return null on error */
function parseJsonSafe(str) {
  try {
    return JSON.parse(str);
  } catch (e) {
    return null;
  }
}

/** Euclidean norm: sqrt of sum of squares */
function vectorNorm(vec) {
  let sum = 0;
  for (let i = 0; i < vec.length; i++) {
    sum += vec[i] * vec[i];
  }
  return Math.sqrt(sum);
}

/**
 * Cosine similarity = (A·B) / (||A|| * ||B||).
 * We already know A's norm, so we only compute B's norm here.
 */
function cosineSimilarity(siteEmbedding, pageEmbedding, siteNorm) {
  // Dot product
  let dot = 0;
  let pageSumSq = 0;
  for (let i = 0; i < siteEmbedding.length; i++) {
    dot += siteEmbedding[i] * pageEmbedding[i];
    pageSumSq += pageEmbedding[i] * pageEmbedding[i];
  }
  const pageNorm = Math.sqrt(pageSumSq);

  // Avoid division by zero
  if (siteNorm === 0 || pageNorm === 0) {
    return 0;
  }

  return dot / (siteNorm * pageNorm);
}

/**
 * Calculates cosine similarity between two JSON arrays in two columns.
 * @param {string} embedding1 JSON string like "[0.1, 0.2, 0.3]"
 * @param {string} embedding2 JSON string like "[0.2, 0.1, 0.4]"
 * @return {number} Cosine similarity between the two vectors
 */
function cosineSimilarityFromJson(embedding1, embedding2) {
  let a, b;
  try {
    a = JSON.parse(embedding1);
    b = JSON.parse(embedding2);
  } catch (e) {
    return "Invalid JSON";
  }

  if (!Array.isArray(a) || !Array.isArray(b)) return "Invalid input";
  if (a.length !== b.length) return "Length mismatch";

  let dot = 0, normA = 0, normB = 0;
  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    normA += a[i] * a[i];
    normB += b[i] * b[i];
  }

  normA = Math.sqrt(normA);
  normB = Math.sqrt(normB);
  if (normA === 0 || normB === 0) return 0;

  return dot / (normA * normB);
}

/**
 * Combines two pre-averaged embeddings using a weighted average.
 * @param {string} avg1 JSON embedding string (e.g., "[0.1, 0.2, 0.3]")
 * @param {number} count1 Number of vectors in avg1
 * @param {string} avg2 JSON embedding string
 * @param {number} count2 Number of vectors in avg2
 * @return {string} Weighted average embedding as a JSON array
 */
function weightedAverageEmbeddings(avg1, count1, avg2, count2) {
  try {
    const a = JSON.parse(avg1);
    const b = JSON.parse(avg2);

    if (!Array.isArray(a) || !Array.isArray(b)) {
      return "Invalid embeddings";
    }

    if (a.length !== b.length) {
      return `Dimension mismatch: ${a.length} vs ${b.length}`;
    }

    const total = count1 + count2;
    const result = [];

    for (let i = 0; i < a.length; i++) {
      result.push(((a[i] * count1) + (b[i] * count2)) / total);
    }

    return JSON.stringify(result);
  } catch (e) {
    return `Error: ${e.message}`;
  }
}

/**
 * Generates an OpenAI embedding for the given text using the OpenAI API.
 *
 * @param {string} inputText The text you want to generate an embedding for.
 * @return {string} A JSON string of the embedding vector or an error message.
 * @customfunction
 */
function generateEmbedding(inputText) {
  const OPENAI_API_KEY = 'YOUR_OPENAI_API_KEY'; // Replace with your real key
  const model = 'text-embedding-3-small'; // Or use 'text-embedding-3-large'
  const dimensions = 1536; // Set this to match your other embeddings

  const payload = {
    input: inputText,
    model: model,
    dimensions: dimensions,
    encoding_format: 'float'
  };

  const options = {
    method: 'post',
    contentType: 'application/json',
    headers: {
      Authorization: 'Bearer ' + OPENAI_API_KEY
    },
    payload: JSON.stringify(payload),
    muteHttpExceptions: true
  };

  const url = 'https://api.openai.com/v1/embeddings';

  try {
    const response = UrlFetchApp.fetch(url, options);
    const json = JSON.parse(response.getContentText());

    if (json.error) {
      return `Error: ${json.error.message}`;
    }

    const embedding = json.data[0].embedding;
    return JSON.stringify(embedding); // Returns as float array in JSON
  } catch (e) {
    return `Request failed: ${e.message}`;
  }
}

⚠️ Important: Replace the placeholder OPENAI_API_KEY with your own key from OpenAI. You can generate one at: https://platform.openai.com/api-keys

javascriptCopyEdit// Insert the full script code you shared above, with this change:
const OPENAI_API_KEY = 'YOUR_OPENAI_API_KEY'; // Replace with your real key

You’ll end up with a script that looks like this:

Step 3: Save and Authorise the Script

  1. Click File → Save
  2. Then click the Run ▶️ button next to any function (e.g. generateEmbedding)
  3. Google will prompt you to review permissions — accept and authorise the script

You only need to do this once per sheet.

Now that your Apps Script is set up, you can use the custom functions directly in your Google Sheet… just like built-in formulas.

The easiest thing to do is to set up 4 different sheets.

Want a copy of the sheet and Apps Script to save time? Make a copy here:

Sheets 1 & 2: Page Embeddings & Link Embeddings

Call these sheets Page Embeddings and Link Embeddings:

Purpose: These sheets helps you store and process page embeddings (e.g. exported from Screaming Frog) and create average embeddings, such as for a category, content type, subfolder or link profile.

The process for creating these two sheets is exactly the same, the only difference is what you import: one is web pages from your site (or any site you want to analyse) and the other that site’s links (or full link profile).

What You’ll Do in This Sheet

  • Paste in URLs and their associated embeddings (from Screaming Frog or elsewhere)
  • Convert any raw, comma-separated vectors to valid JSON arrays (if needed)
  • Calculate the average embedding for any subset of rows, giving you a single sitewide, subfolder-level or link profile embedding

Sheet Layout

ABC
Page URLEmbedding (Raw)Embedding (JSON)
/running-shoes-0.0312, 0.0142, ...=convertStringToFloatArrayAsJson(B2)
/trail-shoes-0.0289, 0.0098, ...=convertStringToFloatArrayAsJson(B3)

You only need this raw → JSON step if you exported from Screaming Frog and your vectors aren’t already valid JSON arrays. If they are, you can paste them directly into Column C.

How to Calculate an Average Embedding for a Group

Let’s say you want to calculate the average embedding for all pages in a category. Just:

  1. Highlight the rows you want to include (e.g. C2:C10)
  2. In an empty cell (e.g. F1), type: =averageJsonEmbeddings(C2:C10)

This outputs a JSON array; the average vector for that set of pages.

You can copy this result into other sheets to use as your site embedding when comparing against queries, links, or other pages.

Sheet 3: Query Embeddings

Call this sheets Query Embeddings:

Purpose: This sheet allows you to generate embeddings for search queries using OpenAI, then group and average them into topic cluster vectors. These cluster vectors act as “semantic targets” to compare content or links against.

What You’ll Do in This Sheet

  • Enter search queries (manually or from keyword research)
  • Generate embeddings directly in the sheet using the generateEmbedding() function
  • Organise queries into topic clusters
  • Average embeddings per cluster to create a topic centroid

Sheet Layout

ABCD
Search QueryEmbedding (JSON)Topic ClusterCluster Embedding
digital PR agency=generateEmbedding(A2)Digital PR(computed below)
digital PR=generateEmbedding(A3)Digital PR
SEO agency=generateEmbedding(A4)SEO

1. Enter Queries in Column A

These can be manually curated or pulled from your keyword research and pasted in.

2. Generate Embeddings Using OpenAI

Use this formula in Column B:

=generateEmbedding(A2)

This sends a request to OpenAI’s API and returns a vector (as a JSON string). Ensure your API key is added in the Apps Script first.

You can drag this formula down the column to embed multiple queries at once.

3. Assign Topic Clusters in Column C

Group each query into a cluster — this can be broad topics, intent groups, or content categories.

4. Create the Average Vector per Topic

To get a topic cluster embedding, use this formula in any empty cell (or in Column D next to the first matching row):

=averageJsonEmbeddings(FILTER(B2:B, C2:C="Digital PR"))

This calculates an averaged embedding of all queries in the “Digital PR” cluster.

You can repeat this for other topics by updating the cluster name.

Optional Layout: One Row per Cluster

To make your clusters easier to reuse in other sheets, you can reformat like this:

FG
ClusterCluster Embedding
Digital PR=averageJsonEmbeddings(FILTER(B2:B10, C2:C10=F2))
SEO=averageJsonEmbeddings(FILTER(B2:B10, C2:C10=F3))

This makes it easy to reference a given cluster embedding by cell (e.g. G2) when comparing against links or pages later.

Sheet 4: Similarity Analysis

Call this sheets Similarity Analysis

Purpose: This is where everything comes together. Use this sheet to calculate cosine similarity between two sets of embeddings from across your workbook, for example:

  • Links (from Link Embeddings) ↔ Topic Clusters (from Query Embeddings)
  • Pages (from Page Embeddings) ↔ Individual Queries (from Query Embeddings)
  • Any two vectors to score semantic similarity (and thus, relevance)

Cosine similarity gives you a measurable way to understand how well aligned a link is to your target topic, or how closely a page matches a keyword (or keyword cluster), using the same kind of semantic signals Google likely uses.

What You’ll Do in This Sheet

  • Reference a list of items (pages, links, queries) and their vector embeddings
  • Select a reference embedding from another sheet (e.g. a keyword cluster or topic average)
  • Run cosine similarity functions to score semantic alignment

Sheet Layout: One-to-Many Comparison

Let’s say your reference embedding (e.g. a topic cluster vector) is in: ‘Query Embeddings’!F2

And you want to compare that to a list of link embeddings in: ‘Link Embeddings’!C2:C’

with URLs in: ‘Link Embeddings’!A2:A’

Option A: Compare Row-by-Row to a Single Reference Vector

In Similarity Analysis!A2, you might use:

=cosineSimilarityFromJson('Link Embeddings'!C2, 'Query Embeddings'!F2)

Drag this down alongside your links to generate relevance scores for each one.

Option B: Batch Compare All Links to a Topic Cluster

Want to do it all at once? Use:

=computeCosineSimilarities(
'Query Embeddings'!F2,
'Link Embeddings'!A2:A12,
'Link Embeddings'!C2:C12
)

This returns a 2-column table showing:

  • The link (from column A)
  • Its cosine similarity score relative to the cluster (from B7)

Here, we see the relevance of each URL compared to the keyword cluster we referenced against. With conditional formatting applied, we can see the most and least relevant links at a glance.

Instead of calculating the relevance of backlinks to the keyword cluster, you could calculate the relevance of your site pages to the cluster, for examples:

=computeCosineSimilarities(
'Page Embeddings'!F2,
'Link Embeddings'!A2:A12,
'Link Embeddings'!C2:C12
)

These sheets are here to be used as a base, depending on what you’re wanting to compare.

Optional: Compare Rows Side-by-Side

You can also compare two embeddings row by row — e.g. a query and the page that ranks for it:

=cosineSimilarityFromJson('Query Embeddings'!B2, 'Page Embeddings'!C2)

This is useful for tracking alignment between specific keywords and target content.

This structure keeps your workbook clean and modular; one sheet for links, one for pages, one for queries, and a central sheet to bring everything together with real relevance scores.

Recap: What You’ve Built

You now have a fully working, multi-sheet setup that lets you:

SheetPurpose
Page EmbeddingsPaste page data and calculate average embeddings for site sections (or the whole site)
Link EmbeddingsPaste link data and calculate average embeddings for the link profile as a whole
Query EmbeddingsGenerate and average embeddings by topic cluster or individual queries
Similarity AnalysisCompare vectors and calculate relevance using cosine similarity

Together, this setup enables semantic relevance analysis at scale, powered by embeddings and done entirely in Google Sheets.


Vector embeddings and cosine similarity aren’t just technical buzzwords, they give us a practical, measurable way to understand what really matters in SEO: semantic relevance.

Instead of relying on proxy metrics like link count or Domain Rating for link analysis, this workflow lets you directly assess how well your content and links align with the queries you’re trying to rank for.

And the best part?

You don’t need Python, notebooks, or a machine learning stack. With Google Sheets, Screaming Frog, OpenAI and a few lines of Apps Script, you can bring relevance-led SEO into your everyday workflow.

This setup gives you a fast, flexible, and scalable way to measure relevance, all in Sheets.

So open up a new Sheet, paste in some embeddings, and start measuring relevance the way it should be measured.

To top