April 2024
Support for Solar Embeddings
Exciting news! We’ve integrated Upstage’s Solar Embeddings into our platform, offering you a powerful new embedding model on Carbon.
To utilize this embedding model, specify the slug
SOLAR
forembedding_model
You can find more details here.
FILE_CREATED
for Web Scrape
We have expanded the
FILE_CREATED
webhook events to fire when files are generated from web scraping requests.
IS_RESYNC
for FILE_READY
Webhook
We’ve added a new boolean property
additional_information.is_resync
to theFILE_READY
webhook event.When it is
false
, the file was synced for the first time.When it is
true
, the file was already synced previously so the current sync is a re-sync.
Carbon Connect 2.0 Is Exiting Beta
Carbon Connect 2.0 is exiting
beta
by this Friday!This means if you run
npm install carbon-connect
moving forward and do not specify a version, we’ll install 2.0 by default.If you need help or have any questions moving over to Carbon Connect 2.0, DM me.
Loading Screen for Carbon Connect 2.0 (carbon-connect@2.0.0-beta22
)
We added a new component level prop
loadingIconColor
which defines the color of the loader icon. This can be specified using standard CSS color names, or directly as either a Hexadecimal (Hex) code or RGB color values.
Support for Google Drive Shortcuts
Users can now seamlessly sync Google Drive shortcuts to reference the files and folders they point to.
How It Works:
For shortcuts within folders, a file object will be generated. When this shortcut file is synced, it will also synchronize its targeted file separately, though not as a child. Please note, there is no hierarchical relationship between a shortcut and its target.
If the shortcut is directly selected from Google’s file picker, a shortcut file object will not be created. Instead, the target will be synced directly.
Importantly, the shortcut file itself will not contain any parsed text of chunks. Instead, it acts as a pointer, with the
file_metadata.target_external_file_id
attribute identifying the file the shortcut targets.
New Webhook Events
We’ve introduced 2 additional webhook events to help track file sync statuses:
FILE_CREATED
: This event is fired when a user queues up a file to be synced for the first time. The body of the webhook will contain a list offile_ids
for files that were created in the same upload, and multiple events could fire for the same upload if a lot of files were queued.ALL_UPLOADED_FILES_QUEUED
: This event is fired when every single item in an upload has been queued for sync, including all children of folders in an upload. The body will contain the upload’srequest_id
.
Couple notes:
Both
file_ids
andrequest_ids
can be used to filter for the files in/user_files_v2
.A
request_id
is now always generated for an upload to support theALL_UPLOADED_FILES_QUEUED
webhook. Previously, it was only generated by the user (unless you’re using Carbon Connect) and passed to us as a parameter. You may still do that and we’ll use your generatedrequest_id
, but if they don’t then we’ll generate anrequest_id
for you on behalf of the user’s upload.These two webhooks currently are supported for 3rd party data sources only. Support for web scrapes and local file uploads will be coming soon.
You can find more details here.
GitHub Connector
We launched our Github integration today that syncs pages from both public and public repositories.
The Carbon Connect
enabledIntegration
slug for Github isGITHUB
. You’ll need to update to2.0.0-beta19
to access the new screen.Users should first submit their GitHub username and access token to our integration endpoint at
/integrations/github
. Then you can then use our global endpoints for listing and syncing specific files in different repositories:List files from repositories with the global endpoints
/integrations/items/list
Sync files from repositories with the global endpoint
/integrations/files/sync
See more specifics about our Github integration here.
Set Max Files Per Upload
A new user-level parameter,
max_files_per_upload
, has been introduced that can be modified via the/update_users
endpoint. It determines the maximum number of files a user can upload in a single request.Files that exceed the maximum number of files will be moved into the
SYNC_ERROR
status with webhooks being fired to alert you.
You can check the
file_single_upload_limit
set for a particular user via theuser
endpoint.Find more details here.
Important Update: The parameter
max_files
now serves to establish the overall file upload limit for a user across all uploads.
Add include_all_children
to Embeddings Endpoint
Added param
include_all_children
to theembeddings
endpoint. When this param is set totrue
, the search is run over all filtered files as well as their children.Filters applied to the endpoint extend to the returned child files.
In-House File Picker for Confluence and Salesforce
We’re excited to introduce our in-house file picker, starting with Confluence and Salesforce. Our in-house file picker is still in beta, but you can test it out by manually running
npm install carbon-connect@2.0.0-beta13
With this update, end users gain the ability to directly select and upload specific files from Confluence and Salesforce. Previously, this functionality was unavailable as neither platform offered their own dedicated file pickers.
When
syncFilesOnConnection
is set tofalse
then our file picker will be enabled.
Hiding 3rd-Party File Picker
The endpoints
/integrations/oauth_url
and/integrations/connect
now support a new boolean parameter namedenable_file_picker
.When
enable_file_picker
is set totrue
(default behavior), a button will be displayed on the success page. Clicking this button will open the file picker associated with the respective source. This is the standard behavior.Conversely, setting
enable_file_picker
tofalse
will hide the file picker button on the success page. In such cases, end users will be directed to use custom or in-house file pickers for file selection.
Sync Outlook and Gmail Attachments
We’ve introduced a new property called
sync_attachments
, which can be specified when syncing via/integrations/gmail/sync
and/integrations/outlook/sync
endpoints. By default, this property is set tofalse
.Setting
sync_attachments
totrue
enables Carbon to automatically sync file attachments from corresponding emails. This includes not only traditional file attachments but also files (such as images) that are added in-line within emails.Each file attachment will be assigned a unique
file_id
, with theparent_id
corresponding to the email the file was attached to.Please note that the same rules that apply to our file uploads also apply to attachments in terms of file size and supported extensions.
Set User File Limits
You have the flexibility to set the maximum number of files that a unique customer ID can upload using the
file_upload_limit
field on theupdate_users
endpoint.This value can be adjusted as needed, allowing you to tailor it according to your own plan limits.
Then you can check the upload limit set for a specific user via the
custom_limits
object on theuser
endpoint.See details here.
Flags for OCR
Added
ocr_job_started_at
to theuser_files_v2
response to denote whether OCR was enabled for a particular file.Added additional OCR properties to be returned via
ocr_properties
, including whether table parsing was enabled.See details here.
Role Management in Customer Portal
You now have the ability to manage who in your organization can create, delete, and view API keys.
Here’s a breakdown of the current roles available:
Admin: This role is empowered to both create and delete API keys.
User: Users with this role can view API keys.
Moving forward, these roles will determine user permissions and access across different sections of the Carbon Customer Portal.
You can access the customer portal via portal.carbon.ai
Expanded OCR Support in Carbon Connect
The prop
useOCR
can now be enabled on the integration level for the following connectors (in addition to local files):OneDrive
Dropbox
Box
Google Drive
Zotero
SharePoint
The prop
parsePdfTablesWithOcr
can now be enabled on the integration level to parse tables with OCR whenuseOCR
is set totrue
.Please note OCR support is only applicable for PDFs at the moment.
You can find more details here.
Return chunk_index
on the /embeddings
Endpoint
We now return the
chunk_index
for specific chunks returned via the/embeddings
endpoint.You can find more details here.
Migrations between Embedding Models
You can now request migrations between embedding models with minimal downtime.
Email me if you’re interested. The cost per migration (not including embedding token costs) starts at $850 one-time.
New request_id
Field
Carbon now accommodates the inclusion of a
request_id
within OAuth URLs, global sync endpoints, and custom sync endpoints (such as Gmail, Outlook, etc.), allowing users to define it as needed. Non-OAuth URL endpoints that auto-sync upon connection (e.g., Freshdesk, Gitbook) also supports this value. Therequest_id
serves as a filter for files throughuser_files_v2
.With Carbon Connect, enabling the
useRequestIds
parameter totrue
will trigger automatic assignment of therequest_id
. Thisrequest_id
will be returned inINITIATE
andADD
/UPDATE
callbacks.It’s essential to note that this configuration adjustment is applicable at the component level rather than the integration level.
This enhancement is part of version
2.0.0-beta8
.Find more details here.
CARBON
Data Connectors for LLMs
COPYRIGHT @ 2024 JCDT DBA CARBON