Similarity

Overview

The Text Similarity function plug-in allows users to compute a similarity score between pairs of texts. By computing a similarity score, one can find related items, group similar items, detect duplicates, and more.


Key Features & Functionality

Computes a similarity score between pairs of texts, returning a value between 0.0 and 1.0, where a higher score implies a higher similarity.

Anonymous
  • v1.0.2 Release Notes
    • Fixed deployment issue

  • v1.0.1 Release Notes
    • Address CVE vulnerability

  • Hi 

    when try to deploy this plug in i have below error:

    [Appian Plugin Hot Deploy] ERROR com.atlassian.plugin.manager.DefaultPluginManager - There was an error loading the descriptor 'Similarity' of plugin 'com.appian.solutions.psc'. Disabling.
    com.atlassian.plugin.PluginParseException: java.lang.UnsatisfiedLinkError: D:\appian\tomcat\apache-tomcat\temp\onnxruntime-java6746264334642517109\onnxruntime.dll: Can't find dependent libraries

    2023-05-17 05:25:43,105 [Appian Plugin Hot Deploy] ERROR com.appiancorp.plugins.LoggingPluginEventListener - Failed to enable Plug-in 'GAM PSC Recommender' (com.appian.solutions.psc) version 1.0.0: 'There was a problem loading the descriptor for module 'Similarity' in plugin 'GAM PSC Recommender'.
    java.lang.UnsatisfiedLinkError: D:\appian\tomcat\apache-tomcat\temp\onnxruntime-java6746264334642517109\onnxruntime.dll: Can't find dependent libraries'
    2023-05-17 05:26:26,537 [Appian Timer - 3] ERROR com.appiancorp.rdbms.datasource.DataSourceFactory - Could not find driver for Oracle

  • Hi Stefan,

    Thank you for taking a look! As you point out, it's CPU-intensive, but the plug-in will take up to 300 strings to prevent any problems. I'll provide documentation shortly pointing out its usage and warnings.

    Please let me know if you have any comments or feedback,

    Thank you!

  • I had a look at the source code. It says

    // This plug-in is resource intensive. Through performance tests, we found out that sending
    // over 300 sentences, can potentially crash a dev site.

    // The model is large (~330MB) and thus consumes a decent amount of memory.

    It uses an Transformer AI model to calculate the cosine similarity between two vectors based on: stackoverflow.com/.../246508