Attack of the Bots!

by Andersen Yang & Jorge Andrews
June 11, 2021

UPDATE: As of January 2026, TCEC no longer provides free access to SurveyAnalytics, though agencies may still choose to use this platform without TCEC support. Local Lead Agencies now have free access to Harvest Your Data and its accompanying apps for mobile data collection. See https://tobaccoeval.ucdavis.edu/mobile-data-collection for more details.

When science fiction imagines how the future might look, these visions often include either helpful robots (like Rosie from The Jetsons), or menacing machines (like the titular character in The Terminator). Either way, the robots of the fictional future are machines that people interact with in the physical world. Coming back to the present, the bots have arrived-- and they are not how we imagined.

Recently, TCEC has been getting calls from the field about bots contaminating online survey data. What does this mean? Bots are small computer programs written to quickly do monotonous tasks that would take humans much longer. This means that a bot can be programed to open and take a survey many times, either to sway the final results or receive the offered incentive (such as a gift card). This has the potential to ruin research by contaminating your data. But fear not! TCEC is on the case.

The bots attack!

We reached out to some of the affected agencies to learn more about the bot infringements. Sarah Mosseri and Jessica Fat from LPC Consulting, Xochitlquetzal Davila from San Francisco Community Health Center, Cynthia Knapp from SAY San Diego, and Samantha Seaman from the American Lung Association all had surveys compromised by bots.

All received several hundred responses in a small window of time, with responses numbering between 1,500 and 2,000 in a matter of hours. Data was salvaged from some of the affected surveys after painstaking hours of work. However, other surveys were so corrupted that no data could be saved, and the only option was to start over-- a total loss.

Where did they come from?

Survey Monkey was the most common data collection platform used among the four projects. After being infiltrated by bots, some of the agencies switched to other platforms, with mixed results. Alternative methods included Alchemer and Google Forms.

Due to last year's stay-at-home order, survey links were mostly distributed digitally. These were sent out using email listservs, newsletters, or via social media channels like Facebook, Twitter, and Instagram.

Two projects posted their survey link on the neighborhood social media app, Nextdoor. In both cases, the teams encountered their first bot incursions shortly after the link was posted.

All surveys advertised gift cards as an incentive to take the survey, with the amount ranging from $5 to $50.

The affected projects attempted to save as much data as possible. Any data salvaged was achieved by systematically searching through the data and identifying suspicious patterns. Red flags include:

Emails that jumble numbers with letters
Multiple surveys from the same IP address
Surveys that were taken in less than ten seconds
Duplicated or nonsensical responses to open-ended questions

In all cases, the bot incursions were recurring events.

Agencies fight back!

LPC Consulting attempted to use a “home-made CAPTCHA"1 to steer bots away from their survey. This effort, however, was ineffective, and the project encountered bots again. Next, they tried using another platform, Alchemer, which offered a built-in CAPTCHA and cookie-based protection. Despite using cookies and two “real” CAPTCHAs, there were still bots, though fewer this time. The team then reached out to Alchemer, which added a code to the survey to prevent further bot attacks.

SF Community Health modified the layout of the survey questions and were more intentional about sharing their survey link. They created a new link and changed the survey layout; however, these efforts were ineffective in the long run, and after three breached surveys, SF Community Health stopped using Survey Monkey and switched to Google Forms.

SF Community Health recommends a more intentional outreach approach by targeting specific groups, as well as using Qualtrics, which includes an integrated CAPTCHA.

After experiencing multiple bot incursions, Cynthia’s team from Say San Diego stopped advertising incentives with their survey and limited where the link was shared. Say San Diego stopped using social media, especially Twitter. They also asked partner agencies to limit the spread of survey links.

An important lesson Cynthia learned: Once a survey link is on social media, agencies lose control of who can access the survey. Say San Diego has since contacted their local lead agency for access to Survey Analytics.

After the initial bot breach, Samantha and the American Lung team reached out to Alchemer. The Alchemer technical support team discovered a Microsoft program which they believed was causing the issue. The support team added new code to the survey to help prevent any more spam responses. Samantha recommends adding a CAPTCHA code to the survey or using a program with CAPTCHA capabilities and requiring an access code to take the survey.

How to protect against bots in the future

Survey Monkey is a free and widely used program, making it a tempting target for hackers using bots to compromise a survey. We at TCEC recommend that projects avoid Survey Monkey and limit social media outreach.

Local lead agencies have free access to Survey Analytics, which has a reCAPTCHA question to help minimize bot risk.

We have asked all LLAs to work with their local competitive grantees to provide access to Survey Analytics. To that end, we have developed some guidelines for LLAs sharing their Survey Analytics accounts:

Have a designated staff point person to manage the account.
Requests from competitive grantees can be sent to this person, who then inputs the survey into Survey Analytics.
The designated staff person can set up periodic updates to the competitive grantee, providing raw data exports of their survey data.
Having a single staff member serving as point person minimizes the risk of data being deleted or tampered with since access to files is limited.
Keep clear and open communication between LLAs and competitive grantees about surveys, be clear about timelines and goals for surveys.
If your competitive grantee wishes to work with an LLA for Survey Analytics access, do so at least a month before the survey is needed – the more time the better.

To recap, we recommend the following to minimize bot attacks:

Avoid Survey Monkey
Use a CAPTCHA
Require secondary authentication using cookies, an IP address, or an access code
Limit advertising incentives
Recruit participants using offline methods such as postcards or door hangers with QR codes instead of through social media
Clean out the bot responses by filtering for matching IP addresses and appropriate length of survey completion

While these suggestions will minimize your chances of being “botted,” be aware that online data collection will always come with risks. Contact TCEC any time for more help collecting and protecting your valuable data.

1CAPTCHA (which stands for "Completely Automated Public Turing test to tell Computers and Humans Apart") is a question that requires authentication; usually matching an image or phrase with a typed response.

Primary Category

Data Collection