Daniel J. B. Clarke

Email: danieljbclarke@gmail.com
Voicemail: 201-357-7356
Website: danieljbclarke.github.io


My name is Daniel J. B. Clarke. I’m an open source enthusiast and data scientist currently working in the Ma’ayan Lab at the Icahn School of Medicine at Mount Sinai. I received a BS in Electrical Engineering and a MS in Computer Engineering in May 2017 from Fairleigh Dickinson University. Since then, I’ve been building and maintaining bioinformatics web applications and conducting bioinformatics research mostly around open source tools, web applications, and accessible data. I’ve been applying standard machine learning approaches for the purpose of knowledge inference and more recently biomarker identification.

Recent advances in machine learning have renewed my strong interest in the field, prompting me to educate myself on methods in deep learning and dabble in zero-shot learning for functional prediction using multi-omics data, parametric dimensionality reduction for updatable TSNE or UMAP visualizations of genomic data, graph neural networks for predictions on knowledge graphs and other directions. After spending time exploring the landscape of deep learning, I find myself particularly intrigued and excited about Energy Based Models and Reinforcement Learning. I hope to persue a PhD in the area and make meaningful contributions to the field.


MS Computer Engineering, Fairleigh Dickinson University, Teaneck NJ
Spring 2017

BS Electrical Engineering, Minor in Computer Science & Mathematics, Fairleigh Dickinson University, Teaneck NJ
Magna Cum Laude, Global Scholars, Spring 2017



Data Science Analyst, Ma’ayan Laboratory of Computational Systems Biology, Icahn School of Medicine at Mount Sinai in New York
February 2018 - Present


Cyberlab Research and Development, Center for Cybersecurity and Information Assurance, Fairleigh Dickinson University, Teaneck NJ
Summer 2014, Fall 2017 - December 2017

  • Conducted research in planning and implementing a virtual Cyber Defense and Forensics Laboratory
  • Incorporated concepts from cybersecurity, embedded systems, and IoT in developed labs

Student Tutor, Fairleigh Dickinson University, Teaneck NJ
Fall 2016 - Spring 2017

  • Available as tutor for every class I’d ever taken
  • Typically tutored higher level math/engineering courses including but not limited to:
    • Calculus 2 & 3, Signals and Systems I & II, Physics I & II, Electronics II, & III

BD2K-LINCS Summer Research in Biomedical Big Data Science, Ma’ayan Laboratory of Computational Systems Biology, Icahn School of Medicine at Mount Sinai in New York
Summer 2016

Student Worker, Grants and Sponsored Projects, Fairleigh Dickinson University, Teaneck NJ
Summer 2015 - Summer 2017

  • Created a data acquisition and reformatting pipeline for cybersecurity and grants websites
  • Assisted with Annual Cybersecurity Symposiums and NSA National Centers of Academic Excellence in Information Assurance/Cyber Defense designation

Intern, NIKSUN, Inc, Princeton NJ
Summer 2013

  • Re-engineered existing proprietary security application interfaces for extended use cases
  • Modified embedded device system firmware
  • Assisted front-end developers by shaping a backend API to meet application requirements


  • 1st Place Winner: IEEE Region 1 Student Paper Competition 2017
  • 1st Place Winner: FDU IEEE Local Student Paper Competition 2017
  • BD2K-LINCS Data Coordination and Integration Center Summer Research Training Fellowship 2016
  • Radio Club of America Scholarship 2016
  • Outstanding Poster Award: LSAMP Research Conference 2016
  • 1st Place Winner: IEEE Region 1 Student Ethics Competition 2015
  • Editor and Writer, FDU Equinox: Student Newspaper 2015 - 2017
  • 1st Place Winner: FDU Cybersecurity Symposium Poster Competition 2014
  • President, FDU Green Team: Campus Environmental Advocacy Club 2014 - 2016
  • 15th Place Winner: NJ Governors Cyber Challenge 2013
  • IEEEXtreme Competitor: Team Marshmallow 2012 - 2016

PUBLICATIONS (ORCID 0000-0003-3471-7416, Google Scholars)

Clarke, D. J. B., Marino, G. B., Deng, E. Z., Xie, Z., Evangelista, J. E., & Ma’ayan, A. (2023). Rummagene: Mining Gene Sets from Supporting Materials of PMC Publications. Cold Spring Harbor Laboratory. https://doi.org/10.1101/2023.10.03.560783 (Preprint)

Evangelista, J. E., Clarke, D. J. B., Xie, Z., Marino, G. B., Utti, V., Jenkins, S. L., Ahooyi, T. M., Bologa, C. G., Yang, J. J., Binder, J. L., Kumar, P., Lambert, C. G., Grethe, J. S., Wenger, E., Taylor, D., Oprea, T. I., de Bono, B., & Ma’ayan, A. (2023). Toxicology knowledge graph for structural birth defects. In Communications Medicine (Vol. 3, Issue 1). Springer Science and Business Media LLC. https://doi.org/10.1038/s43856-023-00329-2

Deng, E. Z., Fleishman, R. H., Xie, Z., Marino, G. B., Clarke, D. J. B., & Ma’ayan, A. (2023). Computational screen to identify potential targets for immunotherapeutic identification and removal of senescence cells. In Aging Cell (Vol. 22, Issue 6). Wiley. https://doi.org/10.1111/acel.13809

Evangelista, J. E., Xie, Z., Marino, G. B., Nguyen, N., Clarke, D. J. B., & Ma’ayan, A. (2023). Enrichr-KG: bridging enrichment analysis across multiple libraries. In Nucleic Acids Research (Vol. 51, Issue W1, pp. W168–W179). Oxford University Press (OUP). https://doi.org/10.1093/nar/gkad393

Marino, G. B., Ngai, M., Clarke, D. J. B., Fleishman, R. H., Deng, E. Z., Xie, Z., Ahmed, N., & Ma’ayan, A. (2023). GeneRanger and TargetRanger: processed gene and protein expression levels across cells and tissues for target discovery. In Nucleic Acids Research (Vol. 51, Issue W1, pp. W213–W224). Oxford University Press (OUP). https://doi.org/10.1093/nar/gkad399

Marino, G. B., Wojciechowicz, M. L., Clarke, D. J. B., Kuleshov, M. V., Xie, Z., Jeon, M., Lachmann, A., & Ma’ayan, A. (2023). lncHUB2: aggregated and inferred knowledge about human and mouse lncRNAs. In Database (Vol. 2023). Oxford University Press (OUP). https://doi.org/10.1093/database/baad009

Lachmann, A., Rizzo, K. A., Bartal, A., Jeon, M., Clarke, D. J. B., & Ma’ayan, A. (2023). PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices. In PeerJ (Vol. 11, p. e14927). PeerJ. https://doi.org/10.7717/peerj.14927

Jeon, M., Xie, Z., Evangelista, J. E., Wojciechowicz, M. L., Clarke, D. J. B., & Ma’ayan, A. (2022). Transforming L1000 profiles to RNA-seq-like profiles with deep learning. In BMC Bioinformatics (Vol. 23, Issue 1). Springer Science and Business Media LLC. https://doi.org/10.1186/s12859-022-04895-5

Kropiwnicki, E., Lachmann, A., Clarke, D. J. B., Xie, Z., Jagodnik, K. M., & Ma’ayan, A. (2022). DrugShot: querying biomedical search terms to retrieve prioritized lists of small molecules. In BMC Bioinformatics (Vol. 23, Issue 1). Springer Science and Business Media LLC. https://doi.org/10.1186/s12859-022-04590-5

Evangelista, J. E., Clarke, D. J. B., Xie, Z., Lachmann, A., Jeon, M., Chen, K., Jagodnik, K. M., Jenkins, S. L., Kuleshov, M. V., Wojciechowicz, M. L., Schürer, S. C., Medvedovic, M., & Ma’ayan, A. (2022). SigCom LINCS: data and metadata search engine for a million gene expression signatures. In Nucleic Acids Research (Vol. 50, Issue W1, pp. W697–W709). Oxford University Press (OUP). https://doi.org/10.1093/nar/gkac328

Clarke, D. J. B., Kuleshov, M. V., Xie, Z., Evangelista, J. E., Meyers, M. R., Kropiwnicki, E., Jenkins, S. L., & Ma’ayan, A. (2022). Gene and drug landing page aggregator. In S. Forslund (Ed.), Bioinformatics Advances (Vol. 2, Issue 1). Oxford University Press (OUP). https://doi.org/10.1093/bioadv/vbac013

Charbonneau, A. L., Brady, A., Czajkowski, K., Aluvathingal, J., Canchi, S., Carter, R., Chard, K., Clarke, D. J. B., Crabtree, J., Creasy, H. H., D’Arcy, M., Felix, V., Giglio, M., Gingrich, A., Harris, R. M., Hodges, T. K., Ifeonu, O., Jeon, M., Kropiwnicki, E., … White, O. (2022). Making Common Fund data more findable: catalyzing a data ecosystem. In GigaScience (Vol. 11). Oxford University Press (OUP). https://doi.org/10.1093/gigascience/giac105

Clarke, D. J. B., Jeon, M., Stein, D. J., Moiseyev, N., Kropiwnicki, E., Dai, C., Xie, Z., Wojciechowicz, M. L., Litz, S., Hom, J., Evangelista, J. E., Goldman, L., Zhang, S., Yoon, C., Ahamed, T., Bhuiyan, S., Cheng, M., Karam, J., Jagodnik, K. M., … Ma’ayan, A. (2021). Appyters: Turning Jupyter Notebooks into data-driven web apps. Patterns, 2(3), 100213. https://doi.org/10.1016/j.patter.2021.100213

Clarke, D. J. B., Rebman, A. W., Bailey, A., Wojciechowicz, M. L., Jenkins, S. L., Evangelista, J. E., Danieletto, M., Fan, J., Eshoo, M. W., Mosel, M. R., Robinson, W., Ramadoss, N., Bobe, J., Soloski, M. J., Aucott, J. N., & Ma’ayan, A. (2021). Predicting Lyme Disease From Patients’ Peripheral Blood Mononuclear Cells Profiled With RNA-Sequencing. Frontiers in Immunology, 12. https://doi.org/10.3389/fimmu.2021.636289

Kropiwnicki, E., Evangelista, J. E., Stein, D. J., Clarke, D. J. B., Lachmann, A., Kuleshov, M. V., Jeon, M., Jagodnik, K. M., & Ma’ayan, A. (2021). Drugmonizome and Drugmonizome-ML: integration and abstraction of small molecule attributes for drug enrichment analysis and machine learning. In Database (Vol. 2021). Oxford University Press (OUP). https://doi.org/10.1093/database/baab017

Kuleshov, M. V., Stein, D. J., Clarke, D. J. B., Kropiwnicki, E., Jagodnik, K. M., Bartal, A., Evangelista, J. E., Hom, J., Cheng, M., Bailey, A., Zhou, A., Ferguson, L. B., Lachmann, A., & Ma’ayan, A. (2020). The COVID-19 Drug and Gene Set Library. Patterns, 1(6), 100090. https://doi.org/10.1016/j.patter.2020.100090

Hoagland, D. A., Clarke, D. J. B., Møller, R., Han, Y., Yang, L., Wojciechowicz, M. L., Lachmann, A., Oguntuyo, K. Y., Stevens, C., Lee, B., Chen, S., Ma’ayan, A., & tenOever, B. R. (2020). Modulating the transcriptional landscape of SARS-CoV-2 as an effective method for developing antiviral compounds. Cold Spring Harbor Laboratory. https://doi.org/10.1101/2020.07.12.199687 (Preprint)

Rao, A. R., & Clarke, D. (2020). Perspectives on emerging directions in using IoT devices in blockchain applications. Internet of Things, 10, 100079. https://doi.org/10.1016/j.iot.2019.100079

Clarke, D. J. B., Wang, L., Jones, A., Wojciechowicz, M. L., Torre, D., Jagodnik, K. M., Jenkins, S. L., McQuilton, P., Flamholz, Z., Silverstein, M. C., Schilder, B. M., Robasky, K., Castillo, C., Idaszak, R., Ahalt, S. C., Williams, J., Schurer, S., Cooper, D. J., de Miranda Azevedo, R., … Ma’ayan, A. (2019). FAIRshake: Toolkit to Evaluate the FAIRness of Research Digital Resources. Cell Systems, 9(5), 417–421. https://doi.org/10.1016/j.cels.2019.09.011

Rao, A.R., Clarke, D. Exploring relationships between medical college rankings and performance with big data. Big Data Anal 4, 3 (2019). https://doi.org/10.1186/s41044-019-0040-9

Clarke, D. J. B., Kuleshov, M. V., Schilder, B. M., Torre, D., Duffy, M. E., Keenan, A. B., Lachmann, A., Feldmann, A. S., Gundersen, G. W., Silverstein, M. C., Wang, Z., & Ma’ayan, A. (2018). eXpression2Kinases (X2K) Web: linking expression signatures to upstream cell signaling networks. Nucleic Acids Research, 46(W1), W171–W179. https://doi.org/10.1093/nar/gky458

A. R. Rao, D. Clarke, M. Bhdiyadra and S. Phadke, “Development of an embedded system course to teach the Internet-of-Things,” 2018 IEEE Integrated STEM Education Conference (ISEC), Princeton, NJ, 2018, pp. 154-160. https://doi.org/10.1109/ISECon.2018.8340468

A. R. Rao, S. Garai, D. Clarke and S. Dey, “A system for exploring big data: an iterative k-means searchlight for outlier detection on open health data,” 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, 2018, pp. 1-8, https://doi.org/10.1109/IJCNN.2018.8489448

A. R. Rao and D. Clarke, “A comparison of models to predict medical procedure costs from open public healthcare data,” 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, 2018, pp. 1-8. https://doi.org/10.1109/IJCNN.2018.8489257

A. Ravishankar Rao & Daniel Clarke (2018) Hiding in Plain Sight: Insights about Health-Care Trends Gained through Open Health Data, Journal of Technology in Human Services, 36:1, 48-55, https://doi.org/10.1080/15228835.2017.1416515

Ravishankar Rao A., Clarke D. (2018) Facilitating the Exploration of Open Health-Care Data Through BOAT: A Big Data Open Source Analytics Tool. In: Tadj L., Garg A. (eds) Emerging Challenges in Business, Optimization, Technology, and Industry. Springer Proceedings in Business and Economics. Springer, Cham, https://doi.org/10.1007/978-3-319-58589-5_7

A. R. Rao and D. Clarke, “An open-source framework for the interactive exploration of Big Data: Applications in understanding health care,” 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, 2017, pp. 1641-1648, https://doi.org/10.1109/IJCNN.2017.7966048

A. R. Rao and D. Clarke, “A fully integrated open-source toolkit for mining healthcare big-data: architecture and applications,” 2016 IEEE International Conference on Healthcare Informatics (ICHI), Chicago, IL, 2016, pp. 255-261, https://doi.org/10.1109/ICHI.2016.35

Nandikotkur, G., Gomez, D., Dovale, J., Clarke, D., Komstead, K., Shah, R., & Aboasu, S. (2016). A Spectral Variability Study Using the Entire FERMI Data from the Blazar 3C 454.3. In American Astronomical Society Meeting Abstracts #228 (pp. 314.10). http://adsabs.harvard.edu/abs/2016AAS...22831410N


Baka MPlayer, u8sand.github.io/Baka-MPlayer
Summer 2014 - Winter 2017

  • Lead programmer, maintainer, and manager
  • Contributed to dependent projects including mpv and qt
  • Collaborated with UX designer and open source community

Amateur Radio License, Technician, Call sign: KD2IQK
Spring 2015

Open Source Software Development github.com/u8sand
2007 - Current

  • Gained expertise in a substantial number of technologies including but not limited to:
    • Python: LangChain, pandas, sklearn, tensorflow, huggingface, fastapi, django, selenium, scapy
    • HTML/Javascript: SolidJS, Svelte, NextJS, React, GraphQL, Typescript, ThreeJS, d3
    • DevOps: postgres, kubernetes, docker, vagrant, ansible, terraform, OpenAPI, CWL, flatcar
    • Unix: nginx, awk, sed, rsync, rclone, iptables, bpf, git, perf, gdb, radare, jq, restic
    • Rust: rocket, rayon, pyo3, wasm-bindgen
    • C/C++: Qt, boost, win32, .NET, CMake, OGRE, DirectX
    • Other: OpenAI, WebAssembly, Haskell, Jekyll, Latex