Help Center

Topic: Recognition

Preventing Duplicate Survey Uploads in PaperSurvey.io

Help Center RecognitionLast updated: 11 October, 2023

duplicate surveys

Ensuring data accuracy is paramount when collecting and processing survey responses on paper. Duplicate uploads can contaminate your dataset and lead to errors in analysis. At PaperSurvey.io, we've implemented robust methods to prevent the accidental uploading of duplicate survey responses and maintain the integrity of your dataset. Here's how we ensure this, and what you need to know:

1. Comparing Unique/Page Identifiers

Applies to surveys using unique page identifiers, learn more

When you choose to utilize unique page identifiers in your paper surveys, our system activates a built-in duplication check. Here's how it works:

  1. Reading Identifiers: Before processing each uploaded page, our system reads the unique identifier, page number, and survey ID (e.g., page 1, unique ID: 91, survey ID: 991).

  2. The Duplicate Check: If a page with the same data has been previously processed, it's marked as a duplicate, and the data won't be processed again.

Disable Duplicate Check: If you're uploading surveys with non-unique identifiers (maybe you used the same PDF to print multiple copies) and wish to bypass this additional check, you can do so in your survey settings by enabling the 'Non-unique page marks' toggle.

Where This Feature Helps:

  • Prevents processing the same responses twice, avoiding data contamination.
  • However, accidental duplicate printing can lead to missed responses or additional time required to manually review pages.

How to Handle misdetected duplicates? There are a few options you can choose:

  • Mark Resolved: If the page isn't relevant (e.g. there was a duplicate scanned), please mark it as resolved.
  • Retry Processing: Click the "Retry" button on the uploads page to process detected duplicates as new responses. Please review if these are not just pages scanned twice.
  • Disable Unique Page Identifiers: The identifier at the bottom-left corner will not be read.
  • Enable 'Non-unique Page Marks': Deactivate the duplication check for the survey but retain the identifiers.

2. Comparing File Hashes

Before processing each page, we calculate the SHA-1* hash and check it against our database to see if we've already processed a file with the same hash. Here's how it works:

  1. Duplicate Check: If you've previously uploaded the same page, it won't be processed again.

Where This Feature Helps:

  • Prevents processing of the same file more than once, even if it's uploaded multiple times.

When It May Not Work:

  • Scanning the same page twice generates different hashes.
  • Modifying scanned pages with image editing software alters the hash.
  • SHA-1 Concerns: While SHA-1 collisions are possible, our threat model remains uncompromised, and SHA-1 is suitable for detecting duplicate uploads.

_*- SHA-1 (Secure Hash Algorithm 1) generates a unique 40-digit number that identifies a file consistently, even if the file is renamed. However, any alterations to the file, even minor ones, will produce a different hash.


Get Started with PaperSurvey.io Software

Get Started

Start your 14-day free trial now, no credit card required.