An ML model for the preliminary assessment of startups seeking investment, required by the client as an auxiliary tool for making funding decisions within the scope of their company’s activities. As initial information, we received a list of deals concluded between companies (or their CEOs) and investors (or investment funds) starting from 2006. We also received a list of companies that have gone public through IPOs in the last 20 years.
We generated a dynamic graph in which companies and investors were represented as nodes, and contracts concluded between them were represented as edges of the graph. Next, using a neural network, we train the vector representation of companies (commonly known as embeddings), taking into account their history: the emergence of new edges (new deals) and the ‘neighbors’ of companies in the graph.
The embedding of a graph vertex (i.e., a company) is a vector of a specified dimension that is updated (trained) by performing auxiliary tasks: identifying the type of vertex (company or non-company) and predicting the emergence of new edges (deals). The core idea behind embeddings is that companies exhibiting similar behavior and interacting with a similar subset of neighbors will have similar representations in the vectors.
In order to train the graph network, we needed to correctly assemble this tabular data and transform it into a graph structure, and also enrich it with data from external sources. For accurate transformation, we used the chain CSV -> Pandas Dataframe -> Networkx Graph -> PYG Graph, which allowed us to monitor the accuracy of the data transformation.
Experiments were conducted on real datasets. The results obtained using the proposed model surpass the most current baseline metrics and are 1.94 times better than the performance of real investors. The best prediction results were achieved for startups in the fields of IT and healthcare.