Prototype tasks: Improving crowdsourcing results through rapid, iterative task design

Conference or workshop item


Gaikwad, S.S., Chhibber, N., Sehgal, V., Ballav, A., Mullings, C., Nasser, A., Richmond-Fuller, A., Gilbee, A., Gamage, D., Whiting, M., Zhou, S., Matin, S., Niranga, S., Goyal, S., Majeti, M., Srinivas, P., Ginzberg, A., Mananova, K., Ziulkoski, K., Regino, J., Sarma, S., Sinha, A., Paul, A., Diemer, C., Murag, M., Dai, W., Pandey, M., Vaish, R. and Bernstein, M. 2017. Prototype tasks: Improving crowdsourcing results through rapid, iterative task design.
AuthorsGaikwad, S.S., Chhibber, N., Sehgal, V., Ballav, A., Mullings, C., Nasser, A., Richmond-Fuller, A., Gilbee, A., Gamage, D., Whiting, M., Zhou, S., Matin, S., Niranga, S., Goyal, S., Majeti, M., Srinivas, P., Ginzberg, A., Mananova, K., Ziulkoski, K., Regino, J., Sarma, S., Sinha, A., Paul, A., Diemer, C., Murag, M., Dai, W., Pandey, M., Vaish, R. and Bernstein, M.
Description

Low-quality results have been a long-standing problem on
microtask crowdsourcing platforms, driving away requesters and justifying low wages for workers. To date, workers have been blamed for low-quality results: they are said to make as little effort as possible, do not pay attention to detail, and lack expertise. In this paper, we hypothesize that requesters may also be responsible for low-quality work: they launch unclear task designs that confuse even earnest workers, under-specify edge cases, and neglect to include examples. We introduce prototype tasks, a crowdsourcing strategy requiring all new task designs to launch a small number of sample tasks. Workers attempt these tasks and leave feedback, enabling the requester to iterate on the design before publishing it. We report a field experiment in which tasks that underwent prototype task iteration produced higher-quality work results than the original task designs. With this research, we suggest that a simple and rapid iteration cycle can improve crowd work, and we provide empirical evidence that requester “quality” directly impacts result quality.

KeywordsCrowdsourcing; Task deisgn; Microtasks
Year23 Oct 2017
ConferenceThe fifth AAAI Conference on Human Computation and Crowdsourcing
Official URLhttps://arxiv.org/abs/1707.05645
Web address (URL) of conference proceedingshttps://www.humancomputation.com/2017/papers.html
Related Output
CitesarXiv: 1707.05645
References

Baecker, R. M. 2014. Readings in Human-Computer Interaction: toward the year 2000. Morgan Kaufmann.
Bernstein, M. S.; Little, G.; Miller, R. C.; Hartmann, B.; Ackerman, M. S.; Karger, D. R.; Crowell, D.; and Panovich, K. 2010. Soylent: A word processor with a crowd inside. In Proceedings of the 23rd Annual ACM Symposium on User Interface Software and Technology, UIST ’10, 313– 322. New York, NY, USA: ACM.

Bernstein, M. S.; Teevan, J.; Dumais, S.; Liebling, D.; and Horvitz, E. 2012. Direct answers for search queries in the long tail. In Proceedings of the SIGCHI conference on human factors in computing systems, 237–246. ACM.

Callison-Burch, C., and Dredze, M. 2010. Creating speech
and language data with amazon’s mechanical turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical
Turk, 1–12. Association for Computational Linguistics.

Cheng, J.; Teevan, J.; and Bernstein, M. S. 2015. Measuring crowdsourcing effort with error-time curves. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 205–221. ACM.

Christoforaki, M., and Ipeirotis, P. 2014. Step: A scalable testing and evaluation platform. In Second AAAI Conference on Human Computation and Crowdsourcing.
Clark, H. H., and Brennan, S. E. 1991. Grounding in
communication. Perspectives on socially shared cognition
13(1991):127–149.

Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and FeiFei, L. 2009. Imagenet: A large-scale hierarchical image database. In Proc. CVPR ’09.

Dix, A. 2009. Human-computer interaction. Springer.
Dow, S.; Kulkarni, A.; Klemmer, S.; and Hartmann, B. 2012. Shepherding the crowd yields better work. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, 1013–1022. ACM.
d.school, S. 2016.

Design method: I like, i wish,
what if. http://dschool.stanford.edu/wp-content/themes/dschool/method-cards/i...

Gaikwad, S. N.; Morina, D.; Nistala, R.; Agarwal, M.; Cossette, A.; Bhanu, R.; Savage, S.; Narwal, V.; Rajpal, K.; Regino, J.; et al. 2015. Daemo: A self-governed crowdsourcing marketplace. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology, 101–102. ACM.

Harper, E. R.; Rodden, T.; Rogers, Y.; Sellen, A.; Human, B.; et al. 2008. Human-computer interaction in the year
2020.

Hinds, P. J. 1999. The curse of expertise: The effects of expertise and debiasing methods on prediction of novice performance. Journal of Experimental Psychology:
Applied 5(2):205.

Holtzblatt, K.; Wendell, J. B.; and Wood, S. 2004. Rapid
contextual design: a how-to guide to key techniques for usercentered design. Elsevier.

Ipeirotis, P. G.; Provost, F.; and Wang, J. 2010. Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD workshop on human computation, 64–67.ACM.

Ipeirotis, P. G. 2010. Analyzing the amazon mechanical turk marketplace. In XRDS: Crossroads, The ACM Magazine for Students - Comp-YOU-Ter, 16–21. ACM.

Irani, L. C., and Silberman, M. 2013. Turkopticon: Interrupting worker invisibility in amazon mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 611–620. ACM.

Khanna, S.; Ratan, A.; Davis, J.; and Thies, W. 2010. Evaluating and improving the usability of mechanical turk for low-income workers in india. In Proceedings of the first ACM symposium on computing for development, 12. ACM.

Kim, J., and Monroy-Hernandez, A. 2016. Storia: Summarizing social media content based on narrative theory using crowdsourcing. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, 1018–1027. ACM.

Kittur, A.; Nickerson, J. V.; Bernstein, M.; Gerber, E.; Shaw, A.; Zimmerman, J.; Lease, M.; and Horton, J. 2013. The future of crowd work. In Proceedings of the 2013 conference on Computer supported cooperative work, 1301–1318. ACM.

Kittur, A.; Chi, E. H.; and Suh, B. 2008. Crowdsourcing
user studies with mechanical turk. In Proceedings of the
SIGCHI conference on human factors in computing systems,
453–456. ACM.

Lasecki, W. S.; Kushalnagar, R.; and Bigham, J. P. 2014.
Legion scribe: real-time captioning by non-experts. In Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility, 303–304. ACM.

Le, J.; Edmonds, A.; Hester, V.; and Biewald, L. 2010. Ensuring quality in crowdsourced search relevance evaluation. In SIGIR workshop on crowdsourcing for search eval.

Marcus, A., and Parameswaran, A. 2015. Crowdsourced data management industry and academic perspectives. Foundations and Trends in Databases 6(1-2):1–161.

Martin, D.; Hanrahan, B. V.; O’Neill, J.; and Gupta, N. 2014. Being a turker. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing, 224–235. ACM.

McInnis, B.; Cosley, D.; Nam, C.; and Leshed, G. 2016.
Taking a hit: Designing around rejection, mistrust, risk, and workers’ experiences in amazon mechanical turk. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 2271–2282. ACM.

Metaxa-Kakavouli, D.; Rusak, G.; Teevan, J.; and Bernstein, M. S. 2016. The web is flat: The inflation of uncommon experiences online. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems, 2893–2899. ACM.

Mitra, T.; Hutto, C.; and Gilbert, E. 2015. Comparing
person-and process-centric strategies for obtaining quality data on amazon mechanical turk. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 1345–1354. ACM.

Nielsen, J. 2000. Why you only need to test with 5 users. Preece, J.; Rogers, Y.; Sharp, H.; Benyon, D.; Holland, S.; and Carey, T. 1994. Human-computer interaction. AddisonWesley Longman Ltd.

Ross, L. 1977. The intuitive psychologist and his shortcomings: Distortions in the attribution process. Advances in experimental social psychology 10:173–220.

Rzeszotarski, J. M., and Kittur, A. 2011. Instrumenting
the crowd: using implicit behavioral measures to predict task performance. In Proceedings of the 24th annual ACM symposium on User interface software and technology, 13–22. ACM.

Sharp, H.; Jenny, P.; and Rogers, Y. 2007. Interaction design:: beyond human-computer interaction.

Sheng, V. S.; Provost, F.; and Ipeirotis, P. G. 2008. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 614–622. ACM.

Stephane, K., and Morgan, J. 2015. Hire fast & build things: How to recruit and manage a top-notch team of distributed engineers. In Upwork.

Von Ahn, L., and Dabbish, L. 2004. Labeling images with a computer game. In Proceedings of the SIGCHI conference
on Human factors in computing systems, 319–326. ACM.

Wilson, J. Q., and Kelling, G. L. 1982. Broken windows.
Critical issues in policing: Contemporary readings 395–
407.

Wu, S.-Y.; Thawonmas, R.; and Chen, K.-T. 2011. Video
summarization via crowdsourcing. In CHI’11 Extended Abstracts on Human Factors in Computing Systems, 1531–
1536. ACM.

Additional information

Stanford Crowd Research Collective
daemo@stanford.edu

Publication process dates
Deposited16 Nov 2020
Files
File Access Level
Open
Permalink -

https://repository.canterbury.ac.uk/item/8wqv8/prototype-tasks-improving-crowdsourcing-results-through-rapid-iterative-task-design

  • 54
    total views
  • 53
    total downloads
  • 0
    views this month
  • 1
    downloads this month

Export as

Related outputs

The role of use cases when adopting augmented reality into higher education pedagogy
Ward, G., Turner, S., Pitt, C., Qi, M., Richmond-Fuller, A. and Jackson, T. 2024. The role of use cases when adopting augmented reality into higher education pedagogy.
Adaptive and flexible online learning during Covid19 lockdown
Manna, S., Nortcliffe, A., Sheikholeslami, G. and Richmond-Fuller, A. 2021. Adaptive and flexible online learning during Covid19 lockdown.
Together apart: nurturing inclusive, accessible and diverse connections within the Canterbury Christ Church University (CCCU) community during COVID-19
Richmond-Fuller, A. 2020. Together apart: nurturing inclusive, accessible and diverse connections within the Canterbury Christ Church University (CCCU) community during COVID-19.
Crowd guilds: Worker-led reputation and feedback on crowdsourcing platforms
Whiting, M . E., Gamage, D., Gaikwad, S. S., Gilbee, A., Goyal, S., Ballav, A., Majeti, D., Chhibber, N., Richmond-Fuller, A., Vargus, F., Sharma, T. S., Chandrakanthan, V., Moura, T., Salih, M. H., Kalejaiye, G. B. T., Ginzberg, A., Mullings, C. A., Dayan, Y., Milland, K., Orefice, H., Regino, J., Parsi, S., Mainali, K., Sehgal, V., Matin, S., Sinha, A., Vaish, R. and Bernstein, M. S. 2017. Crowd guilds: Worker-led reputation and feedback on crowdsourcing platforms. in: CSCW '17: Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing New York Association for Computing Machinery. pp. 1902-1913
The Daemo crowdsourcing marketplace
Gaikwad, S. S., Whiting, M., Gamage, D., Mullings, C. A., Majeti, D., Goyal, S., Gilbee, A., Chhibber, N., Ginzberg, A., Ballav, A., Matin, A., Richmond-Fuller, A., Sehgal, V., Sarma, T., Nasser, A., Regino, J., Zhou, S., Stolzoff, A., Mananova, K., Dhakal, D., Srinivas, P., Ziulkoski, K., Niranga, S. S., Salih, M., Sinha, A., Vaish, R. and Bernstein, M. S. 2017. The Daemo crowdsourcing marketplace. in: Lee, C.P. and Poltrock, S. (ed.) CSCW '17 Companion: Companion of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing New York Association for Computing Machinery.
Boomerang: Rebounding the consequences of reputation feedback on crowdsourcing platforms
Gaikwad, N.S., Morina, D., Ginzberg, A., Mullings, C., Goyal, S., Gamage, D., Diemert, C., Burton, M., Zhou, S., Whiting, M., Ziulkoski, K., Gilbee, A., Niranga, S. S., Sehgal, V., Lin, J., Kristianto, L., Richmond-Fuller, A., Regino, J., Chhibber, N., Majeti, D., Sharma, S., Mananova, K., Dhakal, D., Dai, W., Purynova, V., Sandeep, S., Chandrakanthan, V., Sarma, T., Matin, S., Nasser, A., Nistala, R., Stolzoff, A., Milland, K., Mathur, V., Vaish, R. and Bernstein, M. S. 2016. Boomerang: Rebounding the consequences of reputation feedback on crowdsourcing platforms. in: Rekimoto, J. and Igarashi, T. (ed.) UIST '16: Proceedings of the 29th Annual Symposium on User Interface Software and Technology New York Association for Computing Machinery. pp. 625-637