August 2024

Guru Connector

  • The Guru connector allows users to sync collections, folders, and cards from their Guru account.

  • CCv3 support for Guru will be coming soon and the enabledIntegration value is GURU.

  • See more details here.

Sync Filter for Email Attachments

  • Customers can specifically select to sync only emails that contain attachments.

  • You will still need to specify sync_attachments to true and also set the following filter:

{ "key": "has", "value": "attachment" }

Auto-Refresh Synced Files List in CCv3

  • We now automatically refresh the synced file list whenever users select additional files using our in-house or third-party file picker view. This eliminates the need for users to manually refresh the view.

Updated Children Prop

  • The children prop of the CCv3 component now accepts any valid React node as the children of the modal, from a simple <div> to an entire component.

  • Here’s an example of how the children prop can be used:

children={ <button onClick={() => setOpen((prev) => !prev)}> Toggle Connect </button> }

Custom Styling for Carbon Connect

  • Users can now control styling of CCv3 by targeting the specific class names we’ve provided. This allows for complete customization to match the desired look and feel of the application.

  • For example, classes names include:

    • cc-modal: Applies to the entire modal component

    • cc-modal-header: Targets the header section of the modal

    • cc-modal-footer: Targets the footer section of the modal

    • cc-modal-close: Applies to the close button of the modal

    • cc-modal-overlay: Targets the overlay background of the modal

  • By utilizing these class names, users can easily override the default styles and apply their own CSS rules to achieve the desired appearance.

OCR Support for JPG and PNG

  • We now support jpg, jpeg and png file formats for OCR.

  • In addition to the normal steps for enabling OCR, please set media_type to TEXT (via file upload and /integrations/oauth_url) so Carbon knows to process the image via OCR (versus generating image embeddings via our image embedding model).

HTML for Confluence Articles

  • We now return the raw HTML output for each Confluence article via the file_metadata.saved_filename object under user_files_v2.

Cancel Source Items Sync

  • We added an endpoint /integrations/items/sync/cancel to cancel data source syncs that are initiated via /integrations/items/sync.

  • This allows customers to manually stop syncing for user data sources where sync_status = SYNCING.

New Gmail Filter

  • We added a new Gmail filter to sync all emails sent from a given account. Example:

{ "filters": { "key": "in", "value": "sent" } }

Return Raw Notion Blocks

  • We now return the raw output (blocks) for each Notion page via saved_filename under user_files_v2 when include_raw_file: true.

Shared Google Drive Source Items

  • We now return shared Google Drive files and folders via integration/items/list.

Clearer Error Message for SYNC_ERROR Status

  • When a file goes into SYNC_ERROR from re-syncing via /resync_file because it has been deleted in source, sync_error_message will now say File not found in data source

  • The webhook sent for that error will also contain sync_error_message in additional_information.

Slack UI in Carbon Connect v3 (3.0.0-beta32)

  • Select Conversations to Sync

    • After authenticating, users have full control over which conversations they want to sync via CCv3, including:

      • Public channels

      • Private channels

      • Direct messages (DMs)

      • Group DMs

  • Manage Synced Conversations

    • Users can manage their list of synced conversations at any time via CCv3.

    • Easily add or remove channels and DMs to adjust what gets synced between Slack and Carbon.

Carbon Connect Enhancements

  • Synced URLs for Web Scrapes (CCv3 beta30

    • We now display synced URLs in a dedicated list view under the WEB_SCRAPE integration.

    • The default columns displayed in the list view are name, status, created_at.

    • Parent URLs will be displayed as “folders” and children URLs will be displayed as “files” within the folder.

  • When showFilesTab is set to false we surface a Select files button in the account drop-down for users to sync new files.

  • Data Source Polling Interval

    • Added a new configuration property at the component level called dataSourcePollingInterval.

    • This property controls how frequently data sources are polled for any updates and events.

    • The value is specified in milliseconds (ms) and the minimum allowed value for this property is 3000 ms. The default is 8000 ms.

  • Speaker Diarization

    • Added includeSpeakerLabels for LOCAL_FILES integration and file extensions.

    • Added include_speaker_labels to fileSyncConfig for third-party connectors.

  • openFilesTabTo Param

    • The openFilesTabTo prop is set on the component level and determines which tab (FILE_PICKER or FILES_LIST) the user is taken to by default when they select an integration.

    • The prop takes a string value of either "FILE_PICKER" | "FILES_LIST".

    • This prop only applies when the customer has enabled Carbon’s in-house file picker.

  • We now display a banner when data source items are being synced. The user will still be able to select previously synced items for upload in the meantime.

  • Guru support in CCv3 has been added. The enabledIntegration is GURU.

  • We improved the file list view to be better optimized for mobile devices and ensured that the column headers and values align properly.


Pongo Reranking Modal

  • We’ve added Pongo as a supported reranker model alongside Jina and Cohere.

  • Similar to Cohere and Jina reranking, users can now use PONGO_RERANKER in the following manner on the embeddings endpoint: { "query": "how is anime made?", "k": 5, "rerank": {"model": "PONGO_RERANKER"} }

Third-Party File Picker Behavior

  • We added a new parameter automatically_open_file_picker to the external file sync urls: /integrations/oauth_url and /integrations/connect. When true, the file picker for Google Drive, Box, OneDrive, Sharepoint, Dropbox will automatically open when the user lands on the successful connection page.

  • It’s important to note that some users’ browsers may have popup blockers that could prevent this parameter from functioning. In such cases, the user may receive a prompt from their browser asking for permission to allow popups from the platform. If the user grants permission, the feature will work as intended for future syncs.

  • It’s worth mentioning that OneDrive and SharePoint behave differently due to Microsoft treating the file picker as a separate app. Instead of directly opening the file picker, it will trigger another OAuth prompt. If the user consents to the file picker OAuth, the file picker will then automatically open afterwards.

Speaker Diarization

  • Speaker diarization has been added for audio transcription models. This allows us to format chunks so that the text is organized by utterances and each utterance will be labeled with the speaker. It’ll take this format:

[Speaker A] speaker A's utterance

[Speaker B] speaker B's utterance

  • For local file uploads, there is a new parameter include_speaker_labels. And for external file uploads, the parameter file_sync_config object can take a new property include_speaker_labels. When either is set to true, speaker diarization will be enabled for the audio transcription services

  • Minor note: Speaker label may appear differently depending on the transcription service. Deepgram uses numbers to label speakers while AssemblyAI uses letters.

request_id on Additional Webhooks

  • request_id is now included in following webhook events under the additional_information object for external files: UPDATE, FILES_CREATED, FILE_READY, FILE_ERROR, FILES_SKIPPED, FILE_SYNC_LIMIT_REACHED

Cold Storage for Files (Beta)

  • Overview

    • Carbon supports moving file embeddings between hot and cold storage. This feature allows you to optimize storage costs and improve performance by keeping embeddngs for frequently accessed files in hot storage (vector storage) while moving less frequently used files to cold storage (object storage).

  • Enabling Cold Storage

    • By default, the cold storage feature is not enabled. Once enabled, files will automatically be moved to cold storage after a set period of inactivity. To enable cold storage, you must set a flag at file upload time. Currently cold storage is only available for local file uploads via /uploadfile, /upload_text and /upload_file_from_url.

      • Moving Files from Hot to Cold Storage

        • Once enabled, files will be automatically moved from hot to cold storage after a specified period of inactivity. This period is determined by the time_to_move_to_cold_storage parameter, which represents the number of seconds a file must be inactive before it’s moved to cold storage. There is no manual way to move files to cold storage.

          • You can make an API request to the /modify_cold_storage_parameters endpoint which allows customers to update existing files to use cold storage.

      • Moving Files from Cold to Hot Storage

        • To move files from cold to hot storage, you must make an API request to /move_to_hot_storage. The request will take filters similar to /user_files_v2, and all files matching the provided filters will be moved to hot storage.

        • To avoid a single request hogging resources, there is a limit of 200 files that can be moved in one request. If the number of files matching the filters exceed  200, the files will be processed in batches of 200 over a longer period of time

    • /embeddings Endpoint Behavior

      • If a request is made to /embeddings that involves files in cold storage, an error will be returned that includes a l file_ids for the affected files. This a lows the client to know which files need to be moved to hot storage before the request can be processed.

      • However, exclude_cold_storage_embeddings is set to true, any files in cold storage will be ignored, and no error  ill be thro n for requests involving files in cold storage. Then the search will naturally exclude those files.

      • In the future, we may enable a way to allow /embeddings to work with files that are in both cold and hot storage.

  • File Object Information

    • Activity is defined as when a file was last used, which currently includes file re-syncs, queries involving that file, and updates to file tags.

    • The following fields under the file object (under user_files_v2) are related to cold storage:

      • last_use: A timestamp indicating when a file was last used (i.e., when it last had activity).

      • supports_cold_storage: A flag indicating whether or not a file can be moved to cold storage.

      • time_to_move_to_cold_storage: An integer representing the number of seconds a file must be inactive before it’s moved to cold storage.

      • embedding_storage_status: The storage status of the embeddings for a file, indicating whether they are in cold or hot storage.

  • New Cold Storage Webhooks

    • MOVED_TO_COLD_STORAGE- This event is fired when a file is moved to cold storage.

    • MOVED_TO_HOT_STORAGE- This event is fired when a file is moved to hot storage.

You can find our documentation on cold storage here.

Warnings Object to API Responses

  • In the next two weeks, we plan to add a warnings object to our API responses to display warning messages.

  • Here’s an example of how it looks:

{ "documents": [], "warnings": [ { "warning_type": "FILES_IN_COLD_STORAGE", "object_type": "FILE_LIST", "object_id": [ 47058 ], "message": "These files won't be queried because they are not in hot storage." } ] }

Carbon Connect 3.0 (CCv3) Enhancements

  • We’ve added 3 new props to CCv3:

    • The showFilesTab (boolean) prop has been reintroduced to CCv3 with a default value of true. As a quick reminder, this prop allows customers to hide the file selector and file list view from the CCv3 component. It can be enabled or disabled at both the component and integration levels. If specified for a specific integration, it will override the component-level configuration.

    • The filesTabColumns (array) prop has been added on both the component and integration levels. This prop controls which columns are displayed and hidden in the file list view and accepts an array of strings with values “name”, “status”, “created_at”, and “external_url”.

    • The transcription_service (enum) prop has been added under fileSyncConfig and transcriptionService for LOCAL_FILES integration to specify which speech-to-text model to use for transcriptions. You can specify the enum as ASSEMBLYAI or DEEPGRAM but the prop defaults to DEEPGRAM.

Google Cloud Storage Connector 

  • We launched our GCS connector that enables syncing files from buckets.

  • The Carbon Connect enabledIntegrations value for GCS is GCS.

  • See more specifics about our GCS connector here.

DigitalOcean Storage Connector

  • We launched our DigitalOcena Storage connector that enables syncing files from buckets.

  • The Carbon Connect enabledIntegrations value for Digital Ocean Spaces is S3 (CC support will be launched tomorrow).

  • The Spaces API is interoperable with the AWS S3, so Digital Ocean Spaces makes use of the existing S3 endpoints.

  • This means that the source of Digital Ocean files is S3. To differentiate between data sources and files from Spaces Object Storage, additional metadata has been added:

    • Data Source Metadata

      • data_source_metadata: Indicates the type of data source. Possible values include:

        • S3: Represents an Amazon S3 data source.

        • DigitalOcean Space: Represents a DigitalOcean Spaces data source.

    • File Metadata

      • file_metadata: Specifies the type of file. Possible values include:

        • S3 File: Represents a file stored in Amazon S3.

        • DigitalOcean Space File: Represents a file stored in DigitalOcean Spaces.

        • S3 Bucket: Represents a file representation for a S3 Bucket.

        • DigitalOcean Space Bucket: Represents a file representation for a DigitalOcean Space Bucket.

  • See more specifics about our DigitalOcean Spaces connector here.

New file_types_at_source Filter for /user_files_v2 and /embeddings

  • Introduced a new optional field file_types_at_source for /user_files_v2 and /embeddings.

  • The file_types_at_source field is an array type that currently accepts the following values:

    • TICKET

    • ARTICLE

  • This new field allows users to specify whether we return tickets, articles or both when retrieving content (files and embeddings) from Zendesk, Intercom and Freshdesk.

    • If file_types_at_source contains TICKET, ticket content from Zendesk, Intercom and Freshdesk are returned.

    • If file_types_at_source contains ARTICLE, article content from Zendesk, Intercom and Freshdesk are returned.

CARBON

Data Connectors for LLMs

COPYRIGHT @ 2024 JCDT DBA CARBON