expanded screenshot showing exposure for Helen Baker

Show, Don’t Tell. How and Why We Added Screenshots To Kanary.

Published

Jun 20, 2023

(Updated

Jul 13, 2023

)

Kanary scans hundreds of sites for our members’ data. When we encounter an exposure, we start removing it and show it to our members so they can track our progress. Including screenshots in the results page shows you exactly what Kanary found, saving you the trouble of visiting the data broker or people search site to verify the exposure.

Why did we build this?

We built this feature to address a specific problem: when members find their personal information exposed on sketchy websites, they often want to verify if the information has been removed or see the complete details on the webpage. While some websites allow direct access to results through URLs (e.g., https://thatsthem.com/name/Bob-Smith/San-Francisco-CA), others, like spyfly.com, function more like Single Page Applications where accessing the result entails navigating through JavaScript animations and undergoing laborious search/filtering steps.

But in either case, these sites are filled with ads and trackers so we want to replace the need to visit them altogether. The screenshots allow members to answer their questions about an exposure without leaving the Kanary app.

Where To Find Screenshots In Your Account

screenshot of results page

Preview from the results page

Screenshot of a screenshot in kanary, expanded to be viewed larger

Expanded view on the specific result’s page

When you visit the detailed view for a specific result, clicking on the screenshot thumbnail reveals the full screenshot with the offending exposure.

The Design Challenge

Thumbnail placement

The biggest challenge in adding thumbnails to our exposure cards was that every result wasn't guaranteed to have an associated screenshot. Because of this, we couldn't ensure absolute consistency when looking at a list of options. Given that constraint, finding the most readable card layout—despite potentially inconsistent content—was paramount. We experimented with many locations and sizes for the thumbnails.

One initial experiment was placing the thumbnail on the right side of the card. This allowed for the most consistent alignment of the description text: it would always begin in the same location on the left side of each card. The right side of the text box was ragged to begin with, which made it perfect for a sometimes-there screenshot. Unfortunately, this broke a classic HCI pattern: we tend to scan from left-to-right, top-to-bottom, and our eyes are naturally drawn to images. (The exception being right-to-left languages, where this is mirrored!) So the result was that putting images to the right of the text broke the natural content scanning flow of the layout.

Design explorations showing different thumbnail placements

With thumbnails back on the left, we experimented with a few placements and sizes, finally landing on positioning the image in line with the title that shows the specific info exposed. It still feels small, but for a preview, it does the job and doesn’t overpower other important information on the page.

Multiple Components

We have several repeated components throughout Kanary that adhere to our design guides and leverage easy to use react components. We wanted to place the screenshot where it would still be discoverable and but also settle in the page hierarchy without having to do surgery across multiple components.

lots of competing options

We debated whether the screenshot should take priority over the text, because it presents the same information but in a more visual format. Here were a couple explorations that didn’t make the cut:

design exploration with screenshot on the top left of screen


An Efficient Solution

After some debate, we landed on an efficient solution that did not require introducing yet another white rectangle with rounded corners. On the main results page, we aimed to provide visual cues / affordances indicating the presence of a screenshot within the cards. By designing this in a reusable manner, we avoided the need for a separate component dedicated to displaying screenshots.

Given the already competing actions and elements on the detailed view page, we were also cautious about introducing additional vertical scrolling and clutter that might make it hard for members to find the most important information, fast.

Lastly, we couldn't guarantee that every screenshot would be of high quality. There could be obstructions from ads or situations where the actual result was outside the frame of the screenshot. Considering these uncertainties, it made more sense to prioritize the text representation and offer screenshots as a secondary option to members.

Backend Architecture

During the scraping process, our main Django server sends an event to our web scraping service with a payload containing relevant search parameters. For example:

// Scrape Job Payload
{ 
  site: "dataveria.com"
  first_name: "Jeff"
  last_name: "Bezos"
  city: "Seattle"
  state: "WA"
}

The web scraping service visits the specified site URL and attempts to find an exposure matching the given search parameters. If a potential exposure is found, it creates a screenshot, sends it to our storage bucket, and associates the proper metadata with the response back to the server.

The response from the scraping service goes through a processing phase which includes a matching algorithm to score the relevance of the result. If all checks are passed, we add this to our members’ exposures, using some of the below fields:

// Scrape Job Response 
{
  site: "dataveria.com" 
  first_name: "Jeff"
  last_name: "Bezos"
  city: "Seattle"
  state: "WA"
  url: "dataveria.com/jeff-bezos-seattle" 
  page_text: "Jeff Bezos is a bald billionaire"
  screenshot_path: "scrapes/dataveria/2jalcs2baas/"
}

Security and Limited Access

When a member requests their exposure details from the dashboard, we generate a pre-signed URL using the dump root. This URL has a one-hour expiry window and allows them to access the corresponding screenshot. This approach minimizes the risk of photos being shared outside of Kanary's secure environment and malicious actors abusing the screenshots to access information that members want kept private.

Current Limitations

There are several reasons why you might not see screenshots for every exposure:

  1. We require high-quality screenshots. As mentioned earlier, issues such as obstructions, or incorrect scroll positions can result in screenshots that don't accurately display the exposure details. We flag this requirement on a site-level and can tune results based on site quality over time. To get started, our team manually reviewed screenshots from our site list (available here) to verify quality. Sites that passed our assessment made the first launch.

  2. We generated the exposure before we modified the web scraping service to produce screenshots.

  3. The retention period for the screenshot has expired, and it no longer exists. We do this for security reasons and to comply with data retention policies.

  4. During the initial phase of the launch, only members who had 70% or more exposures accompanied by valid screenshots were shown the screenshots. This approach helped us ensure an appropriate ratio of screenshots to no screenshots, mitigating the risk of the feature appearing underwhelming or going unnoticed. We used Metabase to retrieve the accounts that fit the aforementioned definition and a script to set the testing flag in Posthog for the beta cohort.

Future Development

The feature is now available to all Kanary members, but there are several improvements that build off the shoulders of this general ability to capture our own images and a secure architecture for storage.

100% Coverage

To roll out screenshots across our entire site catalog, we need to address a few challenges with quality. There are ads that slip through our detection and removal logic resulting in an undesired banner at the top or bottom of the screen. Native elements like a menu, modal or search bar can also obstruct the majority of the screen. Beyond that, even if the exposure is correctly placed in the frame, the zoom resolution could be too close, leaving details cut off. Tackling the long tail of issues will get us to 100% coverage!

Lifecycle Screenshots

Inclusion of multiple screenshots throughout the exposure's lifecycle provides clarity beyond simply finding the exposure, as we continuously scan for exposures during their lifecycle.

To illustrate this, let's consider an example:

  1. On January 1, we find your exposure on thatsthem.com.

  2. On January 3, we opt-out your exposure.

  3. We wait for the grace period of 1 week (as each data broker may have a different removal timeframe).

  4. On January 10, we find your exposure still present on the site. (Sometimes opt-out requests are not honored or get lost.)

  5. On January 13, we attempt another opt-out.

  6. Finally, on January 18, we confirm that your exposure is gone.

In this example, we would have captured 3 unique screenshots: one for steps one, four, and six. The final screenshot is particularly valuable as it should display a page with 0 search results, indicating successful removal.

Sometimes, our members still visit the data broker site to verify the removal. Having these visual references satisfies the need for higher fidelity proof of removal, beyond just our assurance, without requiring you to leave the dashboard.

Multimodal scanning

Coverage for photo and video content has been often requested by our members and the underlying architecture for this feature can be repurposed to capture the lifecycle of exposures for other modalities. While the actual work required for detection and removal would differ from a typical data broker exposure, members will still need visual proof of the found exposure (and eventual removal).


Now that the feature is launched, we're also in feedback mode. Our analysts and support team use the screenshots to help escalate requests for removal and verify our systems are working as expected. If you have a request or feedback for our team related to screenshots, we'd love to hear from you.

Thanks for reading!

If this type of work sounds interesting to you, check out our job listings here and shoot our founder a note at rachel[at]kanary.com.

Don't be a sitting duck.

Find where your personal information is being exposed online and remove it for good.

Or, send us a note [email protected]. We’ll respond within a day!

Kanary - Find your exposed personal information, delete it | Product Hunt

© Kanaries, Inc. All rights reserved. 2024