August 2024
Sync Filter for Email Attachments
Customers can specifically select to sync only emails that contain attachments.
You will still need to specify
sync_attachments
totrue
and also set the following filter:
{ "key": "has", "value": "attachment" }
Auto-Refresh Synced Files List in CCv3
We now automatically refresh the synced file list whenever users select additional files using our in-house or third-party file picker view. This eliminates the need for users to manually refresh the view.
Updated Children Prop
The
children
prop of the CCv3 component now accepts any valid React node as the children of the modal, from a simple<div>
to an entire component.Here’s an example of how the children prop can be used:
children={ <button onClick={() => setOpen((prev) => !prev)}> Toggle Connect </button> }
Custom Styling for Carbon Connect
Users can now control styling of CCv3 by targeting the specific class names we’ve provided. This allows for complete customization to match the desired look and feel of the application.
For example, classes names include:
cc-modal
: Applies to the entire modal componentcc-modal-header
: Targets the header section of the modalcc-modal-footer
: Targets the footer section of the modalcc-modal-close
: Applies to the close button of the modalcc-modal-overlay
: Targets the overlay background of the modal
By utilizing these class names, users can easily override the default styles and apply their own CSS rules to achieve the desired appearance.
OCR Support for JPG and PNG
We now support
jpg
,jpeg
andpng
file formats for OCR.In addition to the normal steps for enabling OCR, please set
media_type
toTEXT
(via file upload and/integrations/oauth_url
) so Carbon knows to process the image via OCR (versus generating image embeddings via our image embedding model).
HTML for Confluence Articles
We now return the raw HTML output for each Confluence article via the
file_metadata.saved_filename
object underuser_files_v2
.
Cancel Source Items Sync
We added an endpoint
/integrations/items/sync/cancel
to cancel data source syncs that are initiated via/integrations/items/sync
.This allows customers to manually stop syncing for user data sources where
sync_status
=SYNCING
.
New Gmail Filter
We added a new Gmail filter to sync all emails sent from a given account. Example:
{ "filters": { "key": "in", "value": "sent" } }
Return Raw Notion Blocks
We now return the raw output (blocks) for each Notion page via
saved_filename
underuser_files_v2
wheninclude_raw_file: true
.
Shared Google Drive Source Items
We now return shared Google Drive files and folders via
integration/items/list
.
Clearer Error Message for SYNC_ERROR
Status
When a file goes into
SYNC_ERROR
from re-syncing via/resync_file
because it has been deleted in source,sync_error_message
will now sayFile not found in data source
The webhook sent for that error will also contain
sync_error_message
inadditional_information
.
Slack UI in Carbon Connect v3 (3.0.0-beta32
)
Select Conversations to Sync
After authenticating, users have full control over which conversations they want to sync via CCv3, including:
Public channels
Private channels
Direct messages (DMs)
Group DMs
Manage Synced Conversations
Users can manage their list of synced conversations at any time via CCv3.
Easily add or remove channels and DMs to adjust what gets synced between Slack and Carbon.
Carbon Connect Enhancements
Synced URLs for Web Scrapes (CCv3
beta30
)We now display synced URLs in a dedicated list view under the
WEB_SCRAPE
integration.The default columns displayed in the list view are
name
,status
,created_at
.Parent URLs will be displayed as “folders” and children URLs will be displayed as “files” within the folder.
When
showFilesTab
is set tofalse
we surface aSelect files
button in the account drop-down for users to sync new files.Data Source Polling Interval
Added a new configuration property at the component level called
dataSourcePollingInterval
.This property controls how frequently data sources are polled for any updates and events.
The value is specified in milliseconds (ms) and the minimum allowed value for this property is 3000 ms. The default is 8000 ms.
Speaker Diarization
Added
includeSpeakerLabels
forLOCAL_FILES
integration and file extensions.Added
include_speaker_labels
to fileSyncConfig for third-party connectors.
openFilesTabTo
ParamThe
openFilesTabTo
prop is set on the component level and determines which tab (FILE_PICKER
orFILES_LIST
) the user is taken to by default when they select an integration.The prop takes a string value of either
"FILE_PICKER"
|"FILES_LIST"
.This prop only applies when the customer has enabled Carbon’s in-house file picker.
We now display a banner when data source items are being synced. The user will still be able to select previously synced items for upload in the meantime.
Guru support in CCv3 has been added. The
enabledIntegration
isGURU
.We improved the file list view to be better optimized for mobile devices and ensured that the column headers and values align properly.
Pongo Reranking Modal
We’ve added Pongo as a supported reranker model alongside Jina and Cohere.
Similar to Cohere and Jina reranking, users can now use
PONGO_RERANKER
in the following manner on theembeddings
endpoint: { "query": "how is anime made?", "k": 5, "rerank": {"model": "PONGO_RERANKER"} }
Third-Party File Picker Behavior
We added a new parameter
automatically_open_file_picker
to the external file sync urls:/integrations/oauth_url
and/integrations/connect
. Whentrue
, the file picker for Google Drive, Box, OneDrive, Sharepoint, Dropbox will automatically open when the user lands on the successful connection page.It’s important to note that some users’ browsers may have popup blockers that could prevent this parameter from functioning. In such cases, the user may receive a prompt from their browser asking for permission to allow popups from the platform. If the user grants permission, the feature will work as intended for future syncs.
It’s worth mentioning that OneDrive and SharePoint behave differently due to Microsoft treating the file picker as a separate app. Instead of directly opening the file picker, it will trigger another OAuth prompt. If the user consents to the file picker OAuth, the file picker will then automatically open afterwards.
Speaker Diarization
Speaker diarization has been added for audio transcription models. This allows us to format chunks so that the text is organized by utterances and each utterance will be labeled with the speaker. It’ll take this format:
[Speaker A] speaker A's utterance
[Speaker B] speaker B's utterance
For local file uploads, there is a new parameter
include_speaker_labels
. And for external file uploads, the parameterfile_sync_config
object can take a new propertyinclude_speaker_labels
. When either is set totrue
, speaker diarization will be enabled for the audio transcription servicesMinor note: Speaker label may appear differently depending on the transcription service. Deepgram uses numbers to label speakers while AssemblyAI uses letters.
request_id
on Additional Webhooks
request_id
is now included in following webhook events under theadditional_information
object for external files: UPDATE, FILES_CREATED, FILE_READY, FILE_ERROR, FILES_SKIPPED, FILE_SYNC_LIMIT_REACHED
Cold Storage for Files (Beta)
Overview
Carbon supports moving file embeddings between hot and cold storage. This feature allows you to optimize storage costs and improve performance by keeping embeddngs for frequently accessed files in hot storage (vector storage) while moving less frequently used files to cold storage (object storage).
Enabling Cold Storage
By default, the cold storage feature is not enabled. Once enabled, files will automatically be moved to cold storage after a set period of inactivity. To enable cold storage, you must set a flag at file upload time. Currently cold storage is only available for local file uploads via
/uploadfile
,/upload_text
and/upload_file_from_url
.Moving Files from Hot to Cold Storage
Once enabled, files will be automatically moved from hot to cold storage after a specified period of inactivity. This period is determined by the
time_to_move_to_cold_storage
parameter, which represents the number of seconds a file must be inactive before it’s moved to cold storage. There is no manual way to move files to cold storage.You can make an API request to the
/modify_cold_storage_parameters
endpoint which allows customers to update existing files to use cold storage.
Moving Files from Cold to Hot Storage
To move files from cold to hot storage, you must make an API request to
/move_to_hot_storage
. The request will take filters similar to/user_files_v2
, and all files matching the provided filters will be moved to hot storage.To avoid a single request hogging resources, there is a limit of 200 files that can be moved in one request. If the number of files matching the filters exceed 200, the files will be processed in batches of 200 over a longer period of time
/embeddings
Endpoint BehaviorIf a request is made to
/embeddings
that involves files in cold storage, an error will be returned that includes a lfile_ids
for the affected files. This a lows the client to know which files need to be moved to hot storage before the request can be processed.However,
exclude_cold_storage_embeddings
is set totrue
, any files in cold storage will be ignored, and no error ill be thro n for requests involving files in cold storage. Then the search will naturally exclude those files.In the future, we may enable a way to allow
/embeddings
to work with files that are in both cold and hot storage.
File Object Information
Activity is defined as when a file was last used, which currently includes file re-syncs, queries involving that file, and updates to file tags.
The following fields under the file object (under
user_files_v2
) are related to cold storage:last_use
: A timestamp indicating when a file was last used (i.e., when it last had activity).supports_cold_storage
: A flag indicating whether or not a file can be moved to cold storage.time_to_move_to_cold_storage
: An integer representing the number of seconds a file must be inactive before it’s moved to cold storage.embedding_storage_status
: The storage status of the embeddings for a file, indicating whether they are in cold or hot storage.
New Cold Storage Webhooks
MOVED_TO_COLD_STORAGE
- This event is fired when a file is moved to cold storage.MOVED_TO_HOT_STORAGE
- This event is fired when a file is moved to hot storage.
You can find our documentation on cold storage here.
Warnings
Object to API Responses
In the next two weeks, we plan to add a
warnings
object to our API responses to display warning messages.Here’s an example of how it looks:
{ "documents": [], "warnings": [ { "warning_type": "FILES_IN_COLD_STORAGE", "object_type": "FILE_LIST", "object_id": [ 47058 ], "message": "These files won't be queried because they are not in hot storage." } ] }
Carbon Connect 3.0 (CCv3) Enhancements
We’ve added 3 new props to CCv3:
The
showFilesTab
(boolean) prop has been reintroduced to CCv3 with a default value of true. As a quick reminder, this prop allows customers to hide the file selector and file list view from the CCv3 component. It can be enabled or disabled at both the component and integration levels. If specified for a specific integration, it will override the component-level configuration.The
filesTabColumns
(array) prop has been added on both the component and integration levels. This prop controls which columns are displayed and hidden in the file list view and accepts an array of strings with values “name”, “status”, “created_at”, and “external_url”.The
transcription_service
(enum) prop has been added underfileSyncConfig
andtranscriptionService
forLOCAL_FILES
integration to specify which speech-to-text model to use for transcriptions. You can specify the enum asASSEMBLYAI
orDEEPGRAM
but the prop defaults toDEEPGRAM
.
Google Cloud Storage Connector
We launched our GCS connector that enables syncing files from buckets.
The Carbon Connect
enabledIntegrations
value for GCS isGCS
.See more specifics about our GCS connector here.
DigitalOcean Storage Connector
We launched our DigitalOcena Storage connector that enables syncing files from buckets.
The Carbon Connect
enabledIntegrations
value for Digital Ocean Spaces isS3
(CC support will be launched tomorrow).The Spaces API is interoperable with the AWS S3, so Digital Ocean Spaces makes use of the existing S3 endpoints.
This means that the source of Digital Ocean files is S3. To differentiate between data sources and files from Spaces Object Storage, additional metadata has been added:
Data Source Metadata
data_source_metadata
: Indicates the type of data source. Possible values include:S3
: Represents an Amazon S3 data source.DigitalOcean Space
: Represents a DigitalOcean Spaces data source.
File Metadata
file_metadata
: Specifies the type of file. Possible values include:S3 File
: Represents a file stored in Amazon S3.DigitalOcean Space File
: Represents a file stored in DigitalOcean Spaces.S3 Bucket
: Represents a file representation for a S3 Bucket.DigitalOcean Space Bucket
: Represents a file representation for a DigitalOcean Space Bucket.
See more specifics about our DigitalOcean Spaces connector here.
New file_types_at_source
Filter for /user_files_v2
and /embeddings
Introduced a new optional field
file_types_at_source
for/user_files_v2
and/embeddings
.The
file_types_at_source
field is an array type that currently accepts the following values:TICKET
ARTICLE
This new field allows users to specify whether we return tickets, articles or both when retrieving content (files and embeddings) from Zendesk, Intercom and Freshdesk.
If
file_types_at_source
containsTICKET
, ticket content from Zendesk, Intercom and Freshdesk are returned.If
file_types_at_source
containsARTICLE
, article content from Zendesk, Intercom and Freshdesk are returned.
CARBON
Data Connectors for LLMs
COPYRIGHT @ 2024 JCDT DBA CARBON