Easykost is a product cost management solution specializing in cost estimation and cost modeling, using artificial intelligence algorithms, more specifically, decision trees and random forests.

easyKost uses numerous algorithms that allow to:

• estimate the cost of a new product/service based on selected cost-drivers

• organize the cost-drivers and thus determine the most important ones

• calculate an interval for the estimated cost

• define a cost estimating confidence level

• perform reverse-costing calculations for a target value…

These algorithms are based on proven statistical methods recognized by the international statistics community as Random Forests (based on the CART and Bootstrap algorithms) and also Baysian approaches.

Conversely, the way they are used, the way they are combined in easyKost is specific to the software and is one of its manufacturing “secrets.” Nevertheless, in order to better grasp the results achieved by easyKost, you will find below a brief explanation of two fundamental statistical components of easyKost: decision trees and random forests.

### Decision tree

A decision tree is a predictive statistical model that allows estimating the value of a characteristic of a system from the observation of other characteristics of the same system.

The leaves represent the different values of the target variable sought, and the branches represent the different combinations of input variables that allow obtaining these values.

The relevance of a decision tree depends on the choice of the root variable, which determines the deployment of the branches, and on the best attributes to be placed on each node (branch). In the same way, the branches that are not very representative are pruned to avoid complicating the learning of the tree.

The root variable and indicators are selected using a measure of disorder (done through an entropy function) to identify the indicators allowing to reduce it, as well as a measure of homogeneity, based on the mean quadratic error.

### Random forest

The random forest is a highly effective machine learning method used to solve classification and regression problems.

A random forest algorithm performs machine learning on multiple decision trees which, combined, allow to obtain an extremely precise final prediction.

When constructing the forest, certain individuals are used to learn and others serve as tests, which allow to validate the learning and to have an indicator of the method’s prediction quality.

### Random Forest Applied to easyKost

Without revealing all the details, easyKost uses random forests in the manner described above.

Based on initial reference quotations, {n} new databases are created through sampling with replacement in the initial base. This is the Bootstrap statistical technique.

For example, in the new base {1}, we will have 2 times quotation 1, 0 times quotation 3, 3 times quotation 5, etc. The objective is to imitate the process that generated these data.

That is based on the principle that instead of a quotation in the base used, there could have been another one. This amounts to having access to {n} databases of the same type.

Each database will give rise to a decision tree that will make up the forest and will allow, by aggregating the predictions of each tree, to give the final prediction.

This means that when a cost estimate is produced, the characteristics of the new product/service will browse each tree to give a result that will be averaged. This is the statistical principle of “Bagging”: the average reduces the variance (majority vote principle).

This method allows having extreme confidence in the final result, since on average a random forest in easyKost consists of a minimum of 500 trees.

This confidence in the predictions is strengthened by the use of independent trees.

To do this, new databases are not set up with all the variables, but with a random selection from a variable subset. The importance of the variables is indicated by the number of times the variable was used in the trees.

Therefore, this method allows taking into account all sorts of links between variables and the interactions in particular: the effect of a variable on the cost is not the same depending on the terms and conditions or the values of another variable.