Video Embeddings Support

  • We now support embedding generation for videos, allowing you to run semantic search on the video content based on the similarity of a video snippet to the search query or the text within the video frames, similar to OCR.

    • /uploadfile now takes a new optional parameter called media_type, whose value comes from the FileContentTypes enum. By default all video file formats will default to audio processing if media_type isn’t provided.

    • Currently videos are supported via the uploadfile and upload_file_from_url endpoints but we’ll be adding support for third-party connectors and in Carbon Connect soon.

  • We support the following video file formats:

    • AVI

    • FLV

    • MKV

    • MOV

    • MP4

    • MPEG

    • MPG

    • WEBM

    • WMV

  • The maximum file size is 1 GB, but it can be increased upon request.

  • See more details here.

  • Please note that video embedding generation takes much longer than text and image embeddings. For example, it took 60-90s to embed a 3-minute video.

Intercom Tickets Integration

  • We’re thrilled to announce that our Intercom connector now has support for tickets.

  • The /integrations/oauth_url and integrations/connect endpoints sync articles by default. To customize the sync behavior, use the file_sync_config parameter.

  • You can now also view and sync tickets via the global endpoints /integrations/items/list and /integrations/files/sync.

  • To start syncing ticket content, the Intercom scope should include:

    • To sync user articles only, add these scopes:

      • Read one admin

      • Read and List Articles

    • To sync both user articles and tickets, also add:

      • Read and list users and companies

      • Read tickets

  • The following ticket information is available as tags for filtering:

{ "ticket_type": "Support Request", "ticket_status": "resolved", "ticket_category": "Customer", "ticket_submitter": "example.user@projectmap.com", "ticket_assigned_team": "Technical", "ticket_assigned_admin": "swapnil@carbon.ai" }

  • Text chunks will include the conversation history (comments on the ticket).

  • You can find more details here.

New Webhook Statuses

  • Each created webhook will now have a status of either ACTIVE or FLAGGED that is returned under webhooks endpoint response.

  • ACTIVE: The webhook is operating normally and successfully receiving events.

  • FLAGGED: The webhook URL failed to return a response more than 20 times within a 60 second window. This indicates a potential issue with your webhook URL that you should check. If a webhook is moved to the FLAGGED status, please contact us to update.

Incremental Syncs for Gmail and Outlook

  • We have introduced incremental syncs for the following endpoints for Gmail and Outlook:

    • /integrations/items/sync

    • /integrations/connect

    • /integrations/oauth_url

  • How It Works

    • By setting incremental_sync to true, only new or updated files since the last sync will be re-synced. This means that if a file has already been synced and hasn’t been modified, it will be skipped during the next sync.

    • If the embedding properties or tags of a file change between sync requests, those specific files will be re-synced.

    • Carbon sends a FILE_SKIPPED webhook event for files skipped during the incremental sync. The body of the webhook will contain a list of file_ids for files and a reason in additional_information.

  • This update addresses a common problem where files would be re-synced if a user went through the 3rd-party file selector to select files that had already been synced. With incremental syncs, this issue is resolved, ensuring that only truly new or updated files are synchronized.

  • Note: Incremental syncs is already enabled for Box, Dropbox, OneDrive and Google Drive.

Aggregated Usage Metrics Update

  • We’re excited to announce several improvements to how we aggregate and expose file statistics across the API.

  • The following metrics will now be returned via the /organization and /user endpoints:

    • aggregate_file_size

    • aggregate_num_characters

    • aggregate_num_tokens

    • aggregate_num_embeddings

    • aggregate_num_files_by_source

    • aggregate_num_files_by_file_format

  • To fetch the most updated metrics via the organization endpoint moving forward, you need to take following steps:

    1. The endpoint /organization/statistics takes no parameters and submits a request to asynchronously re-aggregate organization file statistics.

    2. When the re-aggregation is complete, a webhook of the event type FILE_STATISTICS_AGGREGATED will be sent.

    3. After receiving that event, making a request to /organization will return the updated file statistics in the response body.

    4. Additionally, a timestamp of when the file statistics were last updated can be found in file_statistics_aggregated_at.

fileSyncConfig Property for Carbon Connect

  • We have added a new fileSyncConfig prop for Carbon Connect that is set at the component or integration level and accepts the following properties:

  • auto_synced_source_types  (AutoSyncedSourceTypes array): An array specifying the types of sources to automatically sync files from.

  • sync_attachments (boolean): Set to true to enable synchronization of attachments, or false to disable attachment syncing. Applies to helpdesk tickets currently.

  • detect_audio_language (boolean): Set to true to enable automatic detection of audio language during file upload, or false to disable audio language detection.

Deepgram Audio Langauge Detection

  • This feature easily enables automatic language detection for audio file uploads.

    • Added a new optional query parameter detect_audio_language

    • When set to true, Deepgram will automatically detect the language of the uploaded audio file

    • Defaults to false if not specified

    • Applies to the upload_files_from_url and uploadfile endpoints.

Updated Webhook Event: FILE_SYNC_LIMIT_REACHED

  • We have improved the functionality of the FILE_SYNC_LIMIT_REACHED webhook event to provide more granular information when users exceed file upload limits. This event will now be triggered in the following scenarios:

    • When a user attempts to upload files that would cause them to exceed the maximum number of allowed files (max_files).

    • When a user tries to upload more files than the maximum allowed per upload (max_files_per_upload).

    • When a user exceeds the daily 2.5GB file sync limit (existing functionality).

  • To differentiate between the three different limit scenarios, we have introduced a new reason property in the event’s additional information. The reason property will have one of the following values:

    • Max files per upload limit exceeded.

    • Max files limit exceeded.

    • Organization daily limit for file sync has been reached.

HTML File Support

  • We now support for uploading .html files from local and third-party data sources.

  • Similar to other file formats, we provide the original .html file as well as a plain text version of the file as pre-signed URLs via the user_files_v2 endpoint.

Freshdesk Tickets Integration

  • We’re thrilled to announce that our Freshdesk connector now has support for tickets.

  • The /integrations/freshdesk and integrations/connect endpoints sync articles by default. To customize the sync behavior, use the file_sync_config parameter.

  • You can now also view and sync tickets via the global endpoints /integrations/items/list and /integrations/files/sync.

  • To start syncing ticket content, the Freshdesk API key should belong to a user with access to agents and tickets permissions.

  • The following ticket information is available as tags for filtering:

{ "ticket_type": "incident", "ticket_status": "open", "ticket_assignee": "swapnil+zen1@carbon.ai", "ticket_priority": "normal", "ticket_requester": "customer@example.com", }

  • Text chunks will include the conversation history (comments on the ticket).

  • You can find more details here.

New Webhook Type: SPARSE_VECTOR_GENERATION

  • We have introduced a new webhook event type SPARSE_VECTOR_GENERATION that is triggered when the queued status of sparse vector generation for a file changes. It is called SPARSE_VECTOR_QUEUE_STATUS  and has object type CHUNK_LIST.

  • This new webhook includes an object in the additional_information with the key-name sparse_vector_queue_status. The object has two fields:

    • sparse_vector_queue_status, which can be either queued, aborted, or failed

    • sparse_vector_queue_error, which is null unless sparse_vector_queue_status is aborted or failed

  • See more details here.

parent_file_id for Embeddings

  • The embeddings response now includes a parent_file_id field for each chunk returned.

  • This field can contain an integer value representing the ID of the parent file, or null if there is no parent file associated with the embedding.

SharePoint and OneDrive Folder Selection and Syncing

  • You can now select an entire folder for upload, and Carbon will automatically include all nested subfolders and files. This brings our SharePoint and OneDrive functionality in line with popular services like Google Drive, Dropbox and Notion.

  • We have also introduced auto-sync for SharePoint and OneDrive folders. Any new folders and files added to your selected parent folder will be automatically detected and synced by Carbon. To enable auto-sync on folders, the user will need to re-upload the folders again through the 3rd-party file picker.

Dropbox Folder Selection and Syncing

  • You can now select an entire folder for upload, and Carbon will automatically include all nested subfolders and files.

  • We have also introduced auto-sync for Dropbox folders. Any new folders and files added to your selected parent folder will be automatically detected and synced by Carbon, which brings our Dropbox functionality in line with popular services like Google Drive and Notion.

Webhook for Files Skipped

  • To improve visibility into your file processing pipeline, we’ve added a new webhook event: FILES_SKIPPED.

  • This event is triggered whenever Carbon skips processing for one or more files, such as when a file exceeds the size limits imposed by a third-party integration. The webhook payload will include a list of external_file_ids for the affected files, as well as an additional_information field with details on why processing was skipped. This allows you to easily identify and handle files that couldn’t be processed.

Zendesk Tickets Integration

  • We’re thrilled to announce that our Zendesk connector now has support for tickets.

  • The integrations/oauth_url and integrations/connect endpoints now sync articles by default. To sync only tickets or both articles and tickets, use the file_sync_config parameter. The file_sync_config parameter can also enable syncing attachments from ticket comments.

  • You can now also view and sync tickets via the global endpoints /integrations/items/list and /integrations/files/sync.

  • To start syncing ticket content, users must disconnect and reconnect their accounts with the new scopes. Don’t worry, disconnecting won’t affect your files.

  • The following ticket information is available as tags for filtering:

{ "ticket_type": "incident", "ticket_status": "open", "ticket_assignee": "swapnil+zen1@carbon.ai", "ticket_priority": "normal", "ticket_requester": "customer@example.com", "ticket_submitter": "swapnil+zen1@carbon.ai" }

  • Text chunks will include the conversation history (comments on the ticket).

  • You can find more details here.

Carbon Connect 2.0 Exits Beta

  • Carbon Connect 2.0 has officially exited beta as version 2.0.0.

Incremental Syncs for Data Sources

  • We have introduced incremental syncs for the following endpoints:

    • /integrations/items/sync

    • /integrations/connect

    • /integrations/oauth_url

  • How It Works

    • By setting incremental_sync to true, only new or updated files since the last sync will be re-synced. This means that if a file has already been synced and hasn’t been modified, it will be skipped during the next sync.

    • If the embedding properties or tags of a file change between sync requests, those specific files will be re-synced.

    • Carbon sends a FILE_SKIPPED webhook event for files skipped during the incremental sync. The body of the webhook will contain a list of file_ids for files and a reason in additional_information.

  • This update addresses a common problem where files would be re-synced if a user went through the 3rd-party file selector to select files that had already been synced. With incremental syncs, this issue is resolved, ensuring that only truly new or updated files are synchronized.

  • Note: Incremental syncs are only enabled on certain sources to start, including Box, Dropbox, OneDrive and Google Drive.

Re-Sync Child Files Via Resync_File Endpoint

  • When a file-id that belongs to a parent file (i.e., a folder) is submitted for re-sync via the resync_file endpoint, the associated child files will now also be re-synced.

  • This enhancement ensures that all related files within a folder hierarchy are properly synced when the parent file is re-synced.

Post Messages for Third-Party File Pickers

  • External data sources that utilize third-party file pickers will now post messages containing data of the selected file to the parent window when they are used in an iframe.

  • The message will be structured in the following format:

{ "event": "SELECTED", "data": list[{ "external_id": str, "parent_external_id": str | null, "name": str, "url": str | null, "is_folder": bool, "file_format": str | null, }], }

  • Note: Not all of the properties in the data list are available for every data source. For example, GDrive will have parent_external_id, but parent_external_id will always be null for Microsoft because its file picker does not return that data.

New Parameter include_containers

  • A new optional boolean parameter filters.include_containers has been added to the user_files_v2 API endpoint. This parameter allows you to control whether containers (folders) should be included in the API response.

    • When include_containers is set to false, the API will exclude folders from the response. This means that only files with actual content will be returned.

    • In addition to folders, the following types of files will also be excluded when include_containers is false:

      • RSS feed URLs

      • Email queries

      • GitBook spaces

      • GitHub directories

  • These excluded files typically group other files together but do not have any content themselves.

  • The default behavior of user_files_v2 remains unchanged. If the include_containers parameter is not provided or is set to true, folders will be included in the API response as before.

File Statistics Now Include MIME Type

  • file_statistics under the user_files_v2 endpoint now return the MIME type of the file, providing more detailed information about each file.

Organization-Level User Settings

  • Introduced the ability to configure user settings at the organization level.

  • Use the /organization/update endpoint with the global_user_config parameter to set the following organization-wide user settings:

    • auto_sync_enabled_sources

    • max_file

    • max_files_per_upload

  • Find more details here.

Customizable Sync Page Copy

  • Organizations now have the ability to customize the copy on the sync page after a user has connected to an external source.

  • Customizable elements include:

    • Header text

    • Subheader text

    • Button text

  • To update the sync page copy, DM us to make the requested changes. This is a white label specific feature.

  • Please note that success and error messages are not customizable at this time.

File List for Local File Uploads 

  • Added a new screen in Carbon Connect 2.0 (2.0.0-beta25) that displays a list of files uploaded locally by the user

  • Use the showFilesTab configuration option to control whether this view is visible

Limit File Uploads by Type

  • Organizations can now restrict the types of files that can be uploaded to Carbon.

    • File extension restrictions can be set per data source or globally for a given organization.

    • Users can still select disallowed file formats from the file picker, but these files will be ignored during the upload process.

  • To enable this feature, provide Carbon with a list of allowed file extensions, which must be a subset of Carbon’s supported file formats. A dedicated API endpoint will be coming soon!

New GitHub Endpoints

  • We’ve added two new endpoints to enhance the usability of the GitHub connector:

    • /integrations/github/repos: This endpoint allows users to retrieve a list of their GitHub repositories.

    • /integrations/github/sync_repos: This endpoint accepts a list of GitHub repository IDs, enabling users to list items from the specified repositories.

  • These new endpoints provide a more streamlined and efficient way to interact with GitHub repositories within Carbon.

GitHub Repository Selection Screen

  • We’ve introduced a dedicated screen in Carbon Connect 2.0 (2.0.0-beta24) for selecting GitHub repositories.

  • This new feature allows users to easily choose the repositories they want to sync and list items from. The repository selection screen is automatically displayed whenever a user connects their GitHub account.

  • This enhancement simplifies the process of managing GitHub repositories within Carbon Connect, providing a more intuitive and user-friendly experience.

Enhancements to Item Listing

  • We’ve added a new parameter called sync_source_items (or syncSourceItems in Carbon Connect) to give users more control over item syncing. By setting this parameter to false, users can prevent listing items from the corresponding connector.

  • By default, sync_source_items is set to true for all connectors, except for GitHub, where it is set to false. This default behavior for GitHub helps prevent rate limit-related sync issues with GitHub.

  • This enhancement provides users with greater flexibility in managing item syncing across different connectors.

Sorting Options for Source Items

  • We’ve introduced new sorting parameters, order_by and order_dir, for source items (/integrations/items/list). Users can now choose to sort items by the following criteria:

    • id: Sort items by their unique identifier.

    • name: Sort items alphabetically by their name.

    • directories_first: Sort folders first, followed by the remaining items. Both folders and files are sorted by name.

  • By default, items are sorted by name in ascending order (asc), maintaining the existing behavior. Please note that when directories_first is selected, the order_dir parameter is ignored.

External URLs in Salesforce

  • We now return the external URL for Salesforce Knowledge articles for Lightning users.

File List for Local File Uploads 

  • Added a new screen in Carbon Connect 2.0 (2.0.0-beta25) that displays a list of files uploaded locally by the user

  • Use the showFilesTab configuration option to control whether this view is visible

Limit File Uploads by Type

  • Organizations can now restrict the types of files that can be uploaded to Carbon.

    • File extension restrictions can be set per data source or globally for a given organization.

    • Users can still select disallowed file formats from the file picker, but these files will be ignored during the upload process.

  • To enable this feature, provide Carbon with a list of allowed file extensions, which must be a subset of Carbon’s supported file formats. A dedicated API endpoint will be coming soon!

File Statistics Now Include MIME Type

  • file_statistics under the user_files_v2 endpoint now return the MIME type of the file, providing more detailed information about each file.

Organization-Level User Settings

  • Introduced the ability to configure user settings at the organization level.

  • Use the /organization/update endpoint with the global_user_config parameter to set the following organization-wide user settings:

    • auto_sync_enabled_sources

    • max_file

    • max_files_per_upload

  • Find more details here.

Customizable Sync Page Copy

  • Organizations now have the ability to customize the copy on the sync page after a user has connected to an external source.

  • Customizable elements include:

    • Header text

    • Subheader text

    • Button text

  • To update the sync page copy, DM us to make the requested changes. This is a white label-specific feature.

  • Please note that success and error messages are not customizable at this time.

CARBON

Data Connectors for LLMs

COPYRIGHT @ 2024 JCDT DBA CARBON