Data in Depth

Data Source

The data used in the DeFi Survival Analysis Toolkit is from The Graph.

In the current state of the app, the utilized data is from the DeFi lending protocol Aave. Aave pushes its own data to The Graph, and each network it is deployed on has its own sub-graph. These sub-graphs are structured identically, with respect to the transaction-level data.

The data currently available for use within the toolkit comes from the Aave-maintained sub-graphs, specifically the following 7 markets:

Ethereum (V2), Polygon (V3), Avalanche (V3), Optimism (V3), Harmony (V3), Fantom (V3), and Arbitrum (V3).

The structure of the app is designed to be extensible to additional survival datasets as they are added (see Data Storage). To this end, we are currently working with Amberdata to expand the data feed to more DeFi protocols (Uniswap in the works).

Transaction Data Structure

Survival Data Creation

The process for the creation of the survival datasets is coming soon (will include code for data creation).

Survival Data Structure

Basic Survival Data Structure

The columns necessary for a survival model to be created are:

ID - Identifier value

User - User hash

TimeDiff - Time (in seconds) from either the start of observation period or the time of the index event to either the time in which the outcome event occurred (status = 1) or the end of the observation period (status = 0)

Status - Binary value with value 0 if the event is censored during the observation period

Example image:

Survival Datasets for Categorical Splits

These datasets are generally more informative than basic models because it allows users to see differences in certain behaviors based on certain descriptors. This allows for the discovering of changes over time (quarters), overall trend of the market, user clusters, and more. These are put into the dataframe as factor columns after the original, making a dataframe similar to as seen here:

Extensible Data Storage

The toolkit is designed to use the file system of the data storage to create UI options based on additional survival datasets added. The general structure of the file path is: DeFi_Toolkit/Data/protocol/version/market/compute_quarterly_choice/index_event/outcome_event. The UI elements are created in a reactive sequential manner by reading the files at each subsequent step. Thus, as more datasets are added, the toolkit will intrinsically be able to handle it.

Once it extracts the data, the categories to split by are also gotten by looking at all columns in the data object which are not the ones necessary for creating basic survival curves without splitting by category (see survival data structure above). Thus, we can add categories and they will automatically be implemented into the toolkit as a functionality.

##Quarterly-Computed Data As to the ends of figuring out how user behavior has changed throughout time, the toolkit allows for the computation of the survival data in a quarterly manner. In this, each quarter (01-Jan through 31-Mar, 01-Apr through 30-Jun, 01-Jul through 30-Sep, and 01-Oct through 31-Dec) is treated as a separate observation period. This means that it is possible for an outcome event to occur with no associated index event having taken place in the observation period, leading to "left-censored" events. For the calculation of these quarterly survival data sets, we treat left-censored events as truncated in a similar manner to right-censored events. If an outcome event occurs, say, 30 days after the start of the observation period and it was not preceded by an associated index event, we record in the survival data that an observation occurred with 30 days of elapsed time, and that the event was censored.

This method of computation can be selected via using the "Compute Quarterly?:" drop-down. We recommend the use of this style of computation when trying to view changes over time, specifically with "Quarter" as the category to split. We promote the experimentation of this functionality, especially with plots with curves with different lengths (eg: Market_Trend, since time in different trends is different)