May 2024
Webhook Health Monitoring
We added a more robust health check logic for webhook URLs.
If a URL is flagged as unhealthy (and marked as status
FLAGGED
), the system will automatically poll the URL every 10 seconds to check its status and fire a new webhook event calledCHECKUP
per poll request.For
CHECKUP
events, there is no requirement to verify the signature, although you still have the option to do so if desired.When receiving a
CHECKUP
event, it is safe to simply return a200
response without any additional processing.
If a successful response is received during the health check, the URL will be re-activated.
Notifications via Email
We are excited to announce the launch of email notifications to keep our customers informed about important events and actions occurring on our platform. In this initial release, we have implemented the following email notifications:
Webhook Events Paused
Trigger: This notification is sent when a webhook has been temporarily paused due to failing to return a response 20 times within a 60-second window.
Purpose: To alert customers about any interruptions in webhook functionality and provide them with timely information to investigate and resolve the issue.
Webhook Events Unpaused
Trigger: This notification is sent when a previously paused webhook has been unpaused after our system’s polling mechanism (which runs every 10 seconds) determines that the webhook is healthy and responsive again.
Purpose: To inform customers that the webhook has resumed normal operation and that data flow has been restored.
Video Embeddings Support
We now support embedding generation for videos, allowing you to run semantic search on the video content based on the similarity of a video snippet to the search query or the text within the video frames, similar to OCR.
/uploadfile
now takes a new optional parameter calledmedia_type
, whose value comes from theFileContentTypes
enum. By default all video file formats will default to audio processing ifmedia_type
isn’t provided.Currently videos are supported via the
uploadfile
andupload_file_from_url
endpoints but we’ll be adding support for third-party connectors and in Carbon Connect soon.
We support the following video file formats:
AVI
FLV
MKV
MOV
MP4
MPEG
MPG
WEBM
WMV
The maximum file size is 1 GB, but it can be increased upon request.
See more details here.
Please note that video embedding generation takes much longer than text and image embeddings. For example, it took 60-90s to embed a 3-minute video.
Intercom Tickets Integration
We’re thrilled to announce that our Intercom connector now has support for tickets.
The
/integrations/oauth_url
andintegrations/connect
endpoints sync articles by default. To customize the sync behavior, use thefile_sync_config
parameter.You can now also view and sync tickets via the global endpoints
/integrations/items/list
and/integrations/files/sync
.To start syncing ticket content, the Intercom scope should include:
To sync user articles only, add these scopes:
Read one admin
Read and List Articles
To sync both user articles and tickets, also add:
Read and list users and companies
Read tickets
The following ticket information is available as tags for filtering:
{ "ticket_type": "Support Request", "ticket_status": "resolved", "ticket_category": "Customer", "ticket_submitter": "example.user@projectmap.com", "ticket_assigned_team": "Technical", "ticket_assigned_admin": "swapnil@carbon.ai" }
Text chunks will include the conversation history (comments on the ticket).
You can find more details here.
New Webhook Statuses
Each created webhook will now have a status of either
ACTIVE
orFLAGGED
that is returned underwebhooks
endpoint response.ACTIVE
: The webhook is operating normally and successfully receiving events.FLAGGED
: The webhook URL failed to return a response more than 20 times within a 60 second window. This indicates a potential issue with your webhook URL that you should check. If a webhook is moved to theFLAGGED
status, please contact us to update.
Incremental Syncs for Gmail and Outlook
We have introduced incremental syncs for the following endpoints for Gmail and Outlook:
/integrations/items/sync
/integrations/connect
/integrations/oauth_url
How It Works
By setting
incremental_sync
totrue
, only new or updated files since the last sync will be re-synced. This means that if a file has already been synced and hasn’t been modified, it will be skipped during the next sync.If the embedding properties or tags of a file change between sync requests, those specific files will be re-synced.
Carbon sends a
FILE_SKIPPED
webhook event for files skipped during the incremental sync. The body of the webhook will contain a list offile_ids
for files and a reason inadditional_information
.
This update addresses a common problem where files would be re-synced if a user went through the 3rd-party file selector to select files that had already been synced. With incremental syncs, this issue is resolved, ensuring that only truly new or updated files are synchronized.
Note: Incremental syncs is already enabled for Box, Dropbox, OneDrive and Google Drive.
Aggregated Usage Metrics Update
We’re excited to announce several improvements to how we aggregate and expose file statistics across the API.
The following metrics will now be returned via the
/organization
and/user
endpoints:aggregate_file_size
aggregate_num_characters
aggregate_num_tokens
aggregate_num_embeddings
aggregate_num_files_by_source
aggregate_num_files_by_file_format
To fetch the most updated metrics via the
organization
endpoint moving forward, you need to take following steps:The endpoint
/organization/statistics
takes no parameters and submits a request to asynchronously re-aggregate organization file statistics.When the re-aggregation is complete, a webhook of the event type
FILE_STATISTICS_AGGREGATED
will be sent.After receiving that event, making a request to
/organization
will return the updated file statistics in the response body.Additionally, a timestamp of when the file statistics were last updated can be found in
file_statistics_aggregated_at
.
fileSyncConfig
Property for Carbon Connect
We have added a new
fileSyncConfig
prop for Carbon Connect that is set at the component or integration level and accepts the following properties:
auto_synced_source_types
(AutoSyncedSourceTypes
array): An array specifying the types of sources to automatically sync files from.sync_attachments
(boolean): Set totrue
to enable synchronization of attachments, orfalse
to disable attachment syncing. Applies to helpdesk tickets currently.detect_audio_language
(boolean): Set totrue
to enable automatic detection of audio language during file upload, orfalse
to disable audio language detection.
Deepgram Audio Langauge Detection
This feature easily enables automatic language detection for audio file uploads.
Added a new optional query parameter
detect_audio_language
When set to
true
, Deepgram will automatically detect the language of the uploaded audio fileDefaults to
false
if not specifiedApplies to the
upload_files_from_url
anduploadfile
endpoints.
Updated Webhook Event: FILE_SYNC_LIMIT_REACHED
We have improved the functionality of the
FILE_SYNC_LIMIT_REACHED
webhook event to provide more granular information when users exceed file upload limits. This event will now be triggered in the following scenarios:When a user attempts to upload files that would cause them to exceed the maximum number of allowed files (
max_files
).When a user tries to upload more files than the maximum allowed per upload (
max_files_per_upload
).When a user exceeds the daily 2.5GB file sync limit (existing functionality).
To differentiate between the three different limit scenarios, we have introduced a new
reason
property in the event’s additional information. Thereason
property will have one of the following values:Max files per upload limit exceeded.
Max files limit exceeded.
Organization daily limit for file sync has been reached.
HTML File Support
We now support for uploading
.html
files from local and third-party data sources.Similar to other file formats, we provide the original
.html
file as well as a plain text version of the file as pre-signed URLs via theuser_files_v2
endpoint.
Freshdesk Tickets Integration
We’re thrilled to announce that our Freshdesk connector now has support for tickets.
The
/integrations/freshdesk
andintegrations/connect
endpoints sync articles by default. To customize the sync behavior, use thefile_sync_config
parameter.You can now also view and sync tickets via the global endpoints
/integrations/items/list
and/integrations/files/sync
.To start syncing ticket content, the Freshdesk API key should belong to a user with access to
agents
andtickets
permissions.The following ticket information is available as tags for filtering:
{ "ticket_type": "incident", "ticket_status": "open", "ticket_assignee": "swapnil+zen1@carbon.ai", "ticket_priority": "normal", "ticket_requester": "customer@example.com", }
Text chunks will include the conversation history (comments on the ticket).
You can find more details here.
New Webhook Type: SPARSE_VECTOR_GENERATION
We have introduced a new webhook event type
SPARSE_VECTOR_GENERATION
that is triggered when the queued status of sparse vector generation for a file changes. It is calledSPARSE_VECTOR_QUEUE_STATUS
and has object typeCHUNK_LIST
.This new webhook includes an object in the
additional_information
with the key-namesparse_vector_queue_status
. The object has two fields:sparse_vector_queue_status
, which can be eitherqueued
,aborted
, orfailed
sparse_vector_queue_error
, which isnull
unlesssparse_vector_queue_status
isaborted
orfailed
See more details here.
parent_file_id
for Embeddings
The
embeddings
response now includes aparent_file_id
field for each chunk returned.This field can contain an integer value representing the ID of the parent file, or
null
if there is no parent file associated with the embedding.
SharePoint and OneDrive Folder Selection and Syncing
You can now select an entire folder for upload, and Carbon will automatically include all nested subfolders and files. This brings our SharePoint and OneDrive functionality in line with popular services like Google Drive, Dropbox and Notion.
We have also introduced auto-sync for SharePoint and OneDrive folders. Any new folders and files added to your selected parent folder will be automatically detected and synced by Carbon. To enable auto-sync on folders, the user will need to re-upload the folders again through the 3rd-party file picker.
Dropbox Folder Selection and Syncing
You can now select an entire folder for upload, and Carbon will automatically include all nested subfolders and files.
We have also introduced auto-sync for Dropbox folders. Any new folders and files added to your selected parent folder will be automatically detected and synced by Carbon, which brings our Dropbox functionality in line with popular services like Google Drive and Notion.
Webhook for Files Skipped
To improve visibility into your file processing pipeline, we’ve added a new webhook event:
FILES_SKIPPED
.This event is triggered whenever Carbon skips processing for one or more files, such as when a file exceeds the size limits imposed by a third-party integration. The webhook payload will include a list of
external_file_ids
for the affected files, as well as anadditional_information
field with details on why processing was skipped. This allows you to easily identify and handle files that couldn’t be processed.
Zendesk Tickets Integration
We’re thrilled to announce that our Zendesk connector now has support for tickets.
The
integrations/oauth_url
andintegrations/connect
endpoints now sync articles by default. To sync only tickets or both articles and tickets, use thefile_sync_config
parameter. Thefile_sync_config
parameter can also enable syncing attachments from ticket comments.You can now also view and sync tickets via the global endpoints
/integrations/items/list
and/integrations/files/sync
.To start syncing ticket content, users must disconnect and reconnect their accounts with the new scopes. Don’t worry, disconnecting won’t affect your files.
The following ticket information is available as tags for filtering:
{ "ticket_type": "incident", "ticket_status": "open", "ticket_assignee": "swapnil+zen1@carbon.ai", "ticket_priority": "normal", "ticket_requester": "customer@example.com", "ticket_submitter": "swapnil+zen1@carbon.ai" }
Text chunks will include the conversation history (comments on the ticket).
You can find more details here.
Carbon Connect 2.0 Exits Beta
Carbon Connect 2.0 has officially exited beta as version
2.0.0
.
Incremental Syncs for Data Sources
We have introduced incremental syncs for the following endpoints:
/integrations/items/sync
/integrations/connect
/integrations/oauth_url
How It Works
By setting
incremental_sync
totrue
, only new or updated files since the last sync will be re-synced. This means that if a file has already been synced and hasn’t been modified, it will be skipped during the next sync.If the embedding properties or tags of a file change between sync requests, those specific files will be re-synced.
Carbon sends a
FILE_SKIPPED
webhook event for files skipped during the incremental sync. The body of the webhook will contain a list offile_ids
for files and a reason inadditional_information
.
This update addresses a common problem where files would be re-synced if a user went through the 3rd-party file selector to select files that had already been synced. With incremental syncs, this issue is resolved, ensuring that only truly new or updated files are synchronized.
Note: Incremental syncs are only enabled on certain sources to start, including Box, Dropbox, OneDrive and Google Drive.
Re-Sync Child Files Via Resync_File
Endpoint
When a
file-id
that belongs to a parent file (i.e., a folder) is submitted for re-sync via theresync_file
endpoint, the associated child files will now also be re-synced.This enhancement ensures that all related files within a folder hierarchy are properly synced when the parent file is re-synced.
Post Messages for Third-Party File Pickers
External data sources that utilize third-party file pickers will now post messages containing data of the selected file to the parent window when they are used in an iframe.
The message will be structured in the following format:
{ "event": "SELECTED", "data": list[{ "external_id": str, "parent_external_id": str | null, "name": str, "url": str | null, "is_folder": bool, "file_format": str | null, }], }
Note: Not all of the properties in the data list are available for every data source. For example, GDrive will have
parent_external_id
, butparent_external_id
will always benull
for Microsoft because its file picker does not return that data.
New Parameter include_containers
A new optional boolean parameter
filters.include_containers
has been added to theuser_files_v2
API endpoint. This parameter allows you to control whether containers (folders) should be included in the API response.When
include_containers
is set tofalse
, the API will exclude folders from the response. This means that only files with actual content will be returned.In addition to folders, the following types of files will also be excluded when
include_containers
isfalse
:RSS feed URLs
Email queries
GitBook spaces
GitHub directories
These excluded files typically group other files together but do not have any content themselves.
The default behavior of
user_files_v2
remains unchanged. If theinclude_containers
parameter is not provided or is set totrue
, folders will be included in the API response as before.
File Statistics Now Include MIME Type
file_statistics
under theuser_files_v2
endpoint now return the MIME type of the file, providing more detailed information about each file.
Organization-Level User Settings
Introduced the ability to configure user settings at the organization level.
Use the
/organization/update
endpoint with theglobal_user_config
parameter to set the following organization-wide user settings:auto_sync_enabled_sources
max_file
max_files_per_upload
Find more details here.
Customizable Sync Page Copy
Organizations now have the ability to customize the copy on the sync page after a user has connected to an external source.
Customizable elements include:
Header text
Subheader text
Button text
To update the sync page copy, DM us to make the requested changes. This is a white label specific feature.
Please note that success and error messages are not customizable at this time.
File List for Local File Uploads
Added a new screen in Carbon Connect 2.0 (
2.0.0-beta25
) that displays a list of files uploaded locally by the userUse the
showFilesTab
configuration option to control whether this view is visible
Limit File Uploads by Type
Organizations can now restrict the types of files that can be uploaded to Carbon.
File extension restrictions can be set per data source or globally for a given organization.
Users can still select disallowed file formats from the file picker, but these files will be ignored during the upload process.
To enable this feature, provide Carbon with a list of allowed file extensions, which must be a subset of Carbon’s supported file formats. A dedicated API endpoint will be coming soon!
New GitHub Endpoints
We’ve added two new endpoints to enhance the usability of the GitHub connector:
/integrations/github/repos
: This endpoint allows users to retrieve a list of their GitHub repositories./integrations/github/sync_repos
: This endpoint accepts a list of GitHub repository IDs, enabling users to list items from the specified repositories.
These new endpoints provide a more streamlined and efficient way to interact with GitHub repositories within Carbon.
GitHub Repository Selection Screen
We’ve introduced a dedicated screen in Carbon Connect 2.0 (
2.0.0-beta24
) for selecting GitHub repositories.This new feature allows users to easily choose the repositories they want to sync and list items from. The repository selection screen is automatically displayed whenever a user connects their GitHub account.
This enhancement simplifies the process of managing GitHub repositories within Carbon Connect, providing a more intuitive and user-friendly experience.
Enhancements to Item Listing
We’ve added a new parameter called
sync_source_items
(orsyncSourceItems
in Carbon Connect) to give users more control over item syncing. By setting this parameter tofalse
, users can prevent listing items from the corresponding connector.By default,
sync_source_items
is set totrue
for all connectors, except for GitHub, where it is set tofalse
. This default behavior for GitHub helps prevent rate limit-related sync issues with GitHub.This enhancement provides users with greater flexibility in managing item syncing across different connectors.
Sorting Options for Source Items
We’ve introduced new sorting parameters,
order_by
andorder_dir
, for source items (/integrations/items/list
). Users can now choose to sort items by the following criteria:id
: Sort items by their unique identifier.name
: Sort items alphabetically by their name.directories_first
: Sort folders first, followed by the remaining items. Both folders and files are sorted by name.
By default, items are sorted by name in ascending order (
asc
), maintaining the existing behavior. Please note that whendirectories_first
is selected, theorder_dir
parameter is ignored.
External URLs in Salesforce
We now return the external URL for Salesforce Knowledge articles for Lightning users.
File List for Local File Uploads
Added a new screen in Carbon Connect 2.0 (
2.0.0-beta25
) that displays a list of files uploaded locally by the userUse the
showFilesTab
configuration option to control whether this view is visible
Limit File Uploads by Type
Organizations can now restrict the types of files that can be uploaded to Carbon.
File extension restrictions can be set per data source or globally for a given organization.
Users can still select disallowed file formats from the file picker, but these files will be ignored during the upload process.
To enable this feature, provide Carbon with a list of allowed file extensions, which must be a subset of Carbon’s supported file formats. A dedicated API endpoint will be coming soon!
File Statistics Now Include MIME Type
file_statistics
under theuser_files_v2
endpoint now return the MIME type of the file, providing more detailed information about each file.
Organization-Level User Settings
Introduced the ability to configure user settings at the organization level.
Use the
/organization/update
endpoint with theglobal_user_config
parameter to set the following organization-wide user settings:auto_sync_enabled_sources
max_file
max_files_per_upload
Find more details here.
Customizable Sync Page Copy
Organizations now have the ability to customize the copy on the sync page after a user has connected to an external source.
Customizable elements include:
Header text
Subheader text
Button text
To update the sync page copy, DM us to make the requested changes. This is a white label-specific feature.
Please note that success and error messages are not customizable at this time.
CARBON
Data Connectors for LLMs
COPYRIGHT @ 2024 JCDT DBA CARBON