Creating a Next-Generation Financial Dataset from Scratch with NLP and Active Learning
I was delighted to be invited to give a talk at the inaugural spaCy IRL conference in Berlin, Germany. spaCy IRL was a gathering of researchers and practitioners pushing the boundaries of industrial-strength natural language processing with the spaCy software library and ecosystem.
My talk highlighted one of my group’s projects: using natural language processing and active learning to create a new kind of financial intelligence data. The goal was to assemble a dataset about companies’ environmental, social, and governance practices (“ESG”). We used a human-in-the-loop workflow to iteratively train and validate machine learning models to detect mentions of companies’ ESG practices from publicly-available documents. Check out the video to learn more about our methodology.
The slides from the talk are available below.
This post was originally published on datatheoretic.com.
Up next
![](https://cdn.blot.im/blog_c663f02498124fc1b757cd6c65557939/_thumbnails/49386a23-3fef-4192-bfc0-8554d4ff47e9/medium.png)
Previously
![](https://cdn.blot.im/blog_c663f02498124fc1b757cd6c65557939/_thumbnails/ab213766-bb3d-4fb4-8936-4a3d0249f3ec/medium.jpeg)