Instructions

How does this website work?

This website has two modes: a graph mode, which produces pretty pictures from our data, and a search mode, which provides an interface to our back-end database.

In the graph mode, shortly stated, almost everything is clickable. For instance, start from the "View Marketplace" tab. From there, click on one of the markets (e.g., Silk Road 2). This should give you a graph of the total revenue, broken down over time, by category. You can scroll and zoom on that graph, remove categories, etc. "Coverage analysis" gives you an idea of how precise we think our coverage is (when available -- this is not always the case for all markets). "Vendors" lists all the vendors (ranked in decreasing order of estimated revenue) on that marketplace. Clicking on a specific vendor (e.g., 1e48d7b23b98f5defd3ffbdfccb4e14a9dcdcdd59caaeee7ac87df3437930394) opens a page about that specific vendor, showing how much they were selling over time. From there, "Item listings" shows specific item listings they were selling. This too is clickable, and you can get detailed item description (anonymized, as discussed before) as well as feedback data.

The search mode is available by simply clicking on the "Search" tab and issuing SQLite queries to our backend database.

FAQ

  1. How did you get this data?

    All data were, at some point or another, publicly available on the anonymous market websites we investigated. Our publications, in particular our 2015 USENIX Security paper [2] extensively describe how data were collected.

  2. When did you start this research?

    This work builds on research started in 2011, and described in our original Silk Road paper [1].

  3. Which data are included in this website?

    At the moment, data from our two original papers [1, 2], and data used for our more recent work [4, 5, 6, 7, 8]. (Ground truth data used in our evaluation study [8] cannot be made available for reasons made clear in the paper.) Data are typically released with a six-month to a one-year delay after collection. (See below for why we do not release data immediately.)

  4. What does "anonymized" mean?

    Despite the fact the data were publicly available, we choose not to make available any textual information (item name, description, or feedback text) to the general public. Indeed, we cannot manually inspect each entry to ensure that no potentially private information (e.g., URLs, email addresses) would be inadvertently released. We also anonymized all handles (user id, item id).

    In the context of this data, "anonymized" means hashed by SHA-256, with a salt only known to us. This allows users to make queries consistent across the databases (e.g., anonymized vendor handle 029c4c956a5f9234d820cf7cde1c3ae841abefe265175c9f89e88988e3d18b86 always matches the same individual, regardless of the market in which they operate, if they use the same handle across markets) without revealing user information.

  5. Can I get non-anonymized databases?

    For the time being (May 2022), yes, if you are a non-commercial entity. Those are available through the IMPACT project, but support ceased in late 2021. IMPACT (as of May 2022) still provisions data, but we do not know for how long. Technically, we host the data, but IMPACT is crucial in getting the proper administrative approvals so that we can reshare non-anonymized databases.

    Please submit a request for the appropriate dataset in IMPACT. We review all requests, and, for non-anonymized databases, you will need to sign a full memorandum of agreement. These databases may still be partially anonymized, for instance, if we figure that some parties (vendors, careless customers) are exposing personal identifiers (e.g., of fraud victims) in the item description or the feedback text.

    Once IMPACT support stops, we will have to see how we can continue sharing non-anonymized data in a manner that satisfies our legal counsel; but this website will remain available.

    If you are a commercial entity, or want up-to-date data far beyond the realm of what we offer on this website, we invite you to look at the Hikari Labs dark web markets platform which provides a commercial service spun out from this research.

  6. Can you selectively de-anonymize some records for me?

    No.

  7. How accurate are the results I am getting?
  8. The answer to this question is long enough to warrant specific treatment in our papers [2, 8]. TLDR: We are underestimating overall sales (and may be missing some vendors that were only active for a very short time), but results we have been able to cross-validate have been shown to be reasonably close to reality.

    In more details, "accuracy" hinges on two concepts: soundness and completeness of the data. Soundness means that the data we have are always correct; completeness means that we have all of the data we need. All of the data we present on this website has been computed from snapshots of the various markets we have investigated; thus, barring any bugs in the analysis, it should be sound. However, we are unable to fully capture all instances of all pages of a given market (for reasons detailed in the companion paper [2]), so it is not complete, yielding an undercounting of the overall sales. In some cases, we can however, derive a coverage metric that tells us how much we might be missing. Cross-validation with vendor arrest or marketplace takedown records (when available) show that we are generally not that far off.

    Now, a word of caution: for the most recent very large marketplaces added here (AlphaBay and Dream, notably), coverage is considerably spottier than for older marketplaces, due to their sheer size, and the fact that we did not have the budget to run crawlers every day. As a result, the absolute numbers for these marketplaces are clearly lower bounds; with that said, what we miss should not be particularly biased toward a certain category, and so relative numbers (e.g., the proportion of sales pertaining to Cannabis overall) are probably quite reliable.

    In 2022, we completed a study [8] that describes result accuracy in extensive details.

  9. Why are there huge spikes (e.g., Agora) and holes (e.g., Silk Road 1, Dream) in some data?
  10. (Partly) see above. Agora is a weird case in that the site was frequently down, and so a bunch of sales end up being clustered together driving these spikes. In other cases, we had poor coverage of a marketplace at a given point in time; for instance we didn't monitor Silk Road at all between late 2012 and mid-2013, yielding a hole in the data. We also have no Dream scrapes in the winter of 2017-18, and since they removed feedback that was more than a few months old, we have no way of inferring the sales.

  11. Is this research legal and ethical?
  12. All of the data collected from online anonymous marketplaces were, at one point or another, public. (The "dark web" is actually not particularly dark!) The computations and analysis derived from that data were also extensively described in open-access publications.

    However, on this website, we only avail anonymized data, to avoid accidentally revealing potential personal information. In addition, we delay release of our data feeds. In general, data presented on this website will be at least six months old. For a quite thorough discussion about ethics in that line of research, we refer the interested reader to our paper on the subject [3].

    We worked with, and got approval from both the Carnegie Mellon Institutional Review Board (IRB) and university legal counsel before making this website public.

  13. Aren't you worried about SQL injection attacks?
  14. We are using an anonymized replica of our database, and we strongly constrain the statements people can inject (using standard sanitation practices like prepared statements and whitelisting). Even if these precautions were insufficient, in the worst case, an abusive user should not be able to get access to anything beyond the full database, which is itself anonymized, and available through IMPACT anyway. Frankly, it is far quicker to simply ask politely!

  15. What is the license on this website? Can I reuse content?
  16. By using this website you agree to our licensing agreement (see also the short disclaimer in the footer of this page). Back-end databases, available through IMPACT, have in addition their own licensing agreement that you will have to abide to. IMPACT is itself a project from the U.S. Department of Homeland Security, Science and Technology Directorate.

Acknowledgments

This research was partially supported by the Department of Homeland Security Office of Science and Technology (Cyber Security Division), under agreement numbers FA8750-17-2-0188, FA8750-19-1-0152, FA8750-20-1-1003, and contract number N66001-13-C-0131 (jointly supported by the Government of Australia and SPAWAR Systems Center Pacific), by CyLab at Carnegie Mellon under grant DAAD19-02-1-0389 from the Army Research Office; by the National Science Foundation under ITR award CCF-0424422 (TRUST), and CNS-1223762; and by the Singapore Defence Science and Technology Agency (DSTA) under agreement CNZ2000832.

References

[1] Nicolas Christin. Traveling the Silk Road: A measurement analysis of a large anonymous online marketplace. In Proceedings of the 22nd International World Wide Web Conference (WWW'13). Rio de Janeiro, Brazil. May 2013.
[2] Kyle Soska and Nicolas Christin. Measuring the Longitudinal Evolution of the Online Anonymous Marketplace Ecosystem. In Proceedings of the 24th USENIX Security Symposium (USENIX Security'15), pages 33-48. Washington, DC. August 2015.
[3] James Martin and Nicolas Christin. Ethics in Cryptomarket Research. In International Journal of Drug Policy, Volume 35, Issue 6, pages 84-91. 2016.
[4] European Monitoring Centre for Drugs and Drug Addiction and Europol. Drugs and the Darknet: Perspectives for Enforcement, Research and Policy. EMCDDA–Europol Joint publications, Publications Office of the European Union, Luxembourg. November 2017.
[5] Malte Möser, Kyle Soska, Ethan Heilman, Kevin Lee, Henry Heffan, Shashvat Srivastava, Kyle Hogan, Jason Hennessey, Andrew Miller, Arvind Narayanan, and Nicolas Christin. An Empirical Analysis of Traceability in the Monero Blockchain. In Proceedings of the Privacy Enhancing Technology Symposium (PETS 2018), volume 3. Barcelona, Spain. July 2018.
[6] Rolf van Wegberg, Samaneh Tajalizadehkhoob, Kyle Soska, Ugur Akyazi, Carlos Gañán, Bram Klievink, Nicolas Christin, and Michel van Eeten. Plug and Prey? Measuring the Commoditization of Cybercrime via Online Anonymous Markets. In Proceedings of the 27th USENIX Security Symposium (USENIX Security'18). Baltimore, MD. August 2018.
[7] Xiao Hui Tai, Kyle Soska, and Nicolas Christin. Adversarial Matching of Dark Net Market Vendor Accounts. In Proceedings of the 25th ACM SIGKDD Conference of Knowledge, Discovery, and Data Mining (KDD'19). Anchorage, AK. August 2019.
[8] Alejandro Cuevas, Fieke Miedema, Kyle Soska, Nicolas Christin, and Rolf van Wegberg. Measurement by Proxy: On the Accuracy of Online Marketplace Measurements. In Proceedings of the 31st USENIX Security Symposium (USENIX Security'22). Boston, MA. August 2012.

Last modified: Mon Nov 28 11:21:36 AM EST 2022

© 2019-22 Carnegie Mellon University. All rights reserved.

THE DATA AND INFORMATION MADE AVAILABLE THROUGH THIS SITE (THE “DATA”) IS MADE AVAILABLE ON AN “AS-IS” “AS AVAILABLE” BASIS SOLELY FOR NON-COMMERCIAL RESEARCH AND/OR ACADEMIC PURPOSES. TO THE MAXIMUM EXTENT ALLOWED UNDER LAW, CARNEGIE MELLON UNIVERSITY IS NOT RESPONSIBLE FOR ANY CLAIMS, DAMAGES OR OTHER LIABILITY ARISING OUT OF THE USE OF THE DATA, AND YOU ARE RESPONSIBLE FOR YOUR USE OF IT. THE DATA IS NOT PROVIDED TO (AND MAY NOT BE USED BY) ANY PERSON OR ENTITY IN ANY JURISDICTION IN VIOLATION OF APPLICABLE LAWS, RULES OR REGULATIONS. BY DOWNLOADING, COPYING AND/OR USING THE DATA, YOU AGREE THAT YOU HAVE READ AND AGREE TO THE PROVISIONS IN THIS PARAGRAPH AND THE TERMS OF USE BELOW. IF YOU DO NOT AGREE, YOU MAY NOT ACCESS OR USE THE DATA.

Full Terms of use.