Jetpack Re-launches Search Feature as Standalone Service
The Jetpack team announced today that its search service is now available as a separate package from the professional bundle it was previously under. The move should make it more affordable for small sites to use the service. Larger sites might have to shell out more money, depending on how much content is on the site.
“Jetpack Search had only been available as a part of the Jetpack Professional plan at $29 per month, but that pricing felt unfair: a site with one million posts costs 1,000 times as much to index as a site with one thousand posts, but both sites were paying the same amount,” wrote Greg Brown, search wrangler at Automattic, in the announcement post. “New pricing makes Jetpack Search more affordable for small sites and still costs big sites less than the competition.”
Jetpack Search now charges by the number of records:
- Up to 100 records: $5 per month
- Up to 1,000 records: $10 per month
- Up to 10,000 records: $25 per month
- Up to 100,000 records: $60 per month
- Up to 1 million records: $200 per month
- Over 1 million records: $200 per month per million records
“Records” are posts, pages, and custom post types. However, that may change in the future. “We decided to use the term ‘records’ because we think there are use cases where we may want to index comments as their own records sometime in the future,” said Brown. “Similarly, there are use cases for indexing authors, tags, categories, etc. as their own records.”
Jetpack Search combines comments and bbPress plugin replies, which are technically a custom post type, with their parent post. For now, these will not count against users’ record counts.
One caveat with search is that media attachments are considered individual records by default. There are good reasons for some sites to expose attachments via search results. However, this could lead to unwanted records and cost more money each month for users who have a lot of media but do not wish to count those pages. Jetpack does have a filter hook that controls which post types are cached on the WordPress.com servers, so this can at least be configured via code.
Version 8.4 of the Jetpack plugin also includes an updated search interface. When a site visitor enters a search query, a fullscreen modal appears to showcase the search results. The results are snappy because they rely on the WordPress.com API and updated technology underneath. The new interface should work well with most themes out of the box on both desktop and mobile views.
Jetpack search results modal.“This new version of Search is only available with the new plan,” said Brown. “The previous version of Search is still available on the Pro plan and will continue working as-is for the foreseeable future, but our focus going forward is on the new search experience.”
New Architecture Behind Jetpack Search
The new version of Jetpack Search runs on a re-architected technology layer. The old system was an Elasticsearch wrapper around the standard WordPress search feature. While it was quicker than normal WordPress search results, it was still slower than it needed to be. There was no built-in spelling correction. Searched terms were not highlighted.
The new system uses the same sync technology behind Jetpack features like Related Posts, Publicize, and Stats to cache content. It then builds a search index using the Elasticsearch engine from the cache.
The team has built an optimized API, which now supports searching as the user types and uses pageview stats as part of the algorithm in result rankings. Search input also goes directly from the browser to the API on WordPress.com, which minimizes the delay in returning results.
“All search engines work by preprocessing the data to ensure that when a user’s search query comes in the results can quickly be returned,” said Brown. “For Jetpack Search, very little of the processing time on our servers goes to processing the queries. Almost all of the server load comes from indexing the data and reindexing it as the content changes. Across our main search cluster, for instance, only 10% of the CPU time is spent processing about 850 search queries per second. Most of the time is spent on the 650 index operations per second.”
Brown explains this new architecture is why the new pricing tiers are based on the number of records cached from a site. “We preprocess the data, which expands how big it is; store it on the fastest SSDs we can buy; replicate it to multiple data centers; and then can quickly serve results as soon as a search request comes to our API.”
For readers who are interested in a more technical breakdown of the system, Brown has written a full overview titled Real-Time Elasticsearch Indexing on WordPress.com on the Data for Breakfast blog.
No comments