Showing posts with label formulas. Show all posts
Showing posts with label formulas. Show all posts

Text functions in Excel for data cleansing

 

If you are a regular user of Excel and deal with lot of text data, then mastering text functions in Excel is crucial. Excel has in-built text functions to help cleansing data for you. Below are some the text functions in excel helpful for quick data cleansing

 

1.CLEAN

CLEAN function clears any non-printable characters from the text. Non-printable characters are first 32 to characters of ASCII table. For example in cell A1, enter the formula “=CHAR(27)” this displays a character “” as shown below, which represents escape character and it is a non-printable character.





Now, Add a text to this cell using formula below so that the cell has combination of printable and non-printable characters.

    “=CHAR(27)&” Hello”


To clean cell A1, enter the formula “=CLEAN(A1)” in cell A2.


 


2TRIM

Trim is useful function in removing leading and trailing spaces from a text in the cell. But it doesn’t remove spaces between the words.

For example, A1 = TRIM(“     Hello”) prints Hello in cell without spaces

 

3. REPLACE/SUBSTITUTE

Both REPLACE and SUBSTITUTE functions are used to replace a specific segment of a text but has slightly different syntax. Enter a text “Hi There” in cell E4 and E5 and below are the examples of replacing the word “Hi” with “Hello” using REPLACE and SUBSTITUTE

REPLACE syntax “REPLACE(E3,1,2,"Hello")

SUBSTITUTE syntax SUBSTITUTE(E4,"Hi","Hello",1)

REPLACE uses start number, number of characters and new word to be replaced in a text as an argument, whereas SUBSTITUTE uses text to be replaces and new text and instance as arguments.

 



Share:

How to create user-defined functions in Excel

 Excel allows users to create custom functions. This enables user to extend the functionalities of Excel. In the below example, a Boolean function created to accepts a cell as a parameter and returns TRUE if the cell has color FALSE if the cell doesn't have any color. 

Let us call this function IsColor. Open VBA editor, insert a new module and paste the below code into a module.





This will create a new function IsColor. Now, you will be able to use this formula in excel sheet. In the below example cell G5 and G7 are filled with color and G6 and G8 are blank cells. Enter the formula "=IsColor(G5)" and copy the formula down through H8. 


IsColor returns TRUE for G5 and G7 and FALSE for G6 and G8.





Share:

How to convert mailing list in Excel column into a string using Collections?

Let us assume you have say 20 email Ids in an Excel sheet to which a common mail communication is to be sent. One way of combining all the emails into a string is using CONCATENATE function. However it is difficult if the no. of email IDs to be combined are more. Another way to accomplish this is by using Collections in VBA. Collections are nothing but group of related objects.

First store all Email Ids into a Collection and then combine each email Ids in Collection into a String variable and this variable can be used as Mailing List to send mails. See the below example.

Below code will create a Collection and add mail Ids from Excel sheet to it. 

Dim MailList as Collection
Set MailList = New Collection

For i  = 1 to 20

MailList.Add Cells(i,1).Value

Next i

Now we have created collection, next step is to combine all mail Ids in collection into a single string.

Dim S as String

S = ""

For Each Item in MailList

S= S  & Item  & ";"

Next 

Now we have created single string 'S' of all 20 mail Ids. This variable can be used in outlook mail  application to send mails as shown below.

MailObj.To = S or
MailObj.CC = S





Share:

How to use curly braces in non-array formula in Excel


Curly Braces are generally used in array formulas in Excel. Array formulas are entered by pressing CONTROL+SHIFT+ENTER and Excel automatically inserts curly-braces in formula. Array formulas are useful when one has to get results from multiple set of values into a single cell or range of values.

Array Formula:

For example, Consider Range A1 through A3 has text "No", "Yes" and "None". If we have to get the value of maximum length of string in cell B1, then, This can be done by Array Entering (CONTROL+SHIFT+ENTER) the below formula in cell B1.

                                                       {=MAX(LEN(A1:A3")}

which returns a value 4.

Non-Array formula with curly-braces{}

However curly-braces can be used in non-array formulas by manually inserting it. Below is an example of using curly-braces in "VLOOKUP" without array entering the formula.

                           =VLOOKUP("C",{"A",1;"B",2;"C",3},2,0)

which returns a value 3.
:-

Share:

The Difference between Cumulative Distribution Function (CDF) and Probability Density Function (PDF)

Cumulative Distribution Function (CDF) vs Probability Distribution Function (PDF)

The Cumulative Distribution Function (CDF) of a random variable 'X' is the probability that the variable value is less than or equal to 'X'. It is the cumulative of all possible values between two defined ranges.On the other hand, Probability Distribution Function (PDF) is the probability of random variable 'X' equal to certain value. In other words it is a derivative of CDF.

Let us understand this with the example of Normal Distribution data. Normal Distribution curve is a bell shaped curve and is symmetric about its mean. its value extends from -ve infinity to +ve infinity. The curve extends indefinitely in both direction.

Consider a Normally Distributed Data with Mean = 494 and SD = 100. Let us calculate the  probability of random variable X between its mean and 500. Excel has NORMAL.DIST() function which returns CDF and PDF for a Normal Distribution. It takes four arguments namely; X, MEAN, SD and CUMULATIVE. The 4th argument CUMULATIVE is Boolean and if set to TRUE then the function returns CDF and if set to FALSE, the function returns PDF.

CDF = "NORM.DIST(500,494,100,TRUE)" = 0.5239 or  52.39%
PDF = "NORM.DIST(500,494,100,FALSE)" = 0.00 or  0%


In the above calculation CDF of 52.39% is a probability of X is from -infinity to 500. To get the probability of X between MEAN (494) and 500, subtract 50% from CDF, Hence the probability of X between MEAN and 500 is 2.39%.

Share:

The difference between DAYS and DAYS360 formulas

DAYS360:

The DAYS360 function is used  to calculate number of days between two dates. But it is based on the assumption that each month in a year has 30 days. This method is adopted in some financial institutions for the calculation of interest and other accounting purposes.

DAYS():
However Excel 2016 version has DAYS() formula which calculates the number of calendar days between two dates.
Share:

How to enter formula with relative and absolute reference using VBA

Relative vs Absolute:

To enter a formula using VBA into a single cell, We use Range().Formula property and assign it to worksheet function.

For Example, VBA code to enter VLOOKUP formula in cell B1 is
                    
                         RANGE("B1").FORMULA = "=VLOOKUP(A1,SHEET2!A:B,2,0)"


Suppose, We want to apply the same formula from Range B1 through B10, then change the VBA
 code to
                         RANGE("B1:B10").FORMULA = "=VLOOKUP(A1,SHEET2!A:B,2,0)"


Now, in the new code Range A1 is fixed or absolute for cells from B1 through B10. To change it to relative reference, replace A1 with A:A on the RHS of the code. Now, the new code look like

                        RANGE("B1:B10").FORMULA = "=VLOOKUP(A:A,SHEET2!A:B,2,0)"

Replacing A1 with A:A changes the lookup value from Absolute to Relative reference (i.e. for cell B1, A:A is  A1; for B2, A:A is A2 and so on..).

NOTE:


Use Coupon Code 580EDUNF83 to get additional 20% discount

This relative referencing style doesnot work for EOMONTH worksheet function. The syntax for EOMONTH is EOMONTH(START DATE, MONTHS).

                 Let RANGE("B1:B10").FORMULA =  "=EOMONTH(A1,0)"

In the above formula, If we replace A1 with A:A then it will return #VALUE error. An alternative to get the relative reference to such worksheet functions is replacing A1 with the below formula                                                        INDIRECT(ADDRESS(ROW(),1,4)). 

So the correct code would be

      RANGE("B1:B10").FORMULA =  "=EOMONTH(INDIRECT(ADDRESS(ROW(),1,4)),0)"

Share:

CAGR calculation in Excel

CAGR:

CAGR - Compounded Annual Growth Rate is a measure of growth over a period of time. It is usually calculated for number of years. Businesses use CAGR to measure the growth in their revenue over a period of time. In finance CAGR is used as a measure that indicates appreciation in the value of an Investment over a period of time.

Example:

Suppose an initial investment of $10000 is made and its value appreciates to $17000 after 5 years. CAGR for this investment is calculated using the below formula.


                         



                                                          where n = no. of years





The CAGR for the above investment example is 0.11196 or 11.196%. Here, CAGR is equivalent to constant growth rate per year. (i.e. Constant growth of 11.196% for 5 consecutive years.

You can validate the CAGR by calculating Final Value using the initial value of $10000 and CAGR of 11.196%.

                                Final Value = $10000 * (1+0.11196)^5 = $17000





Share:

How to convert time format in to Minutes in Excel

How to convert time format in to Minutes in Excel:

Consider an example where Start Time, End Time and Total Time Taken for a set of activities are stored in a standard Excel Time format (i.e. hh:mm:ss). Here TOTAL TIME = END DATE - START DATE. In column F, let us calculate total time taken in Minutes.




  Now, to convert Excel time formats into Minutes, We have to multiply the values in each of columnE with 1440. So enter the formula "=E2*1440" in cell F2 to convert 1:00:00 into minutes.






Excel stores standard time formats as DAY fraction. So the value equivalent of 1:00:00 is 0.041667 days. (i.e. 1/24). Therefore 0.041667 * 1440 = 60 Minutes.










Share:

How to use SUMPRODUCT to calculate mean of Discreet Distribution

Discreet Distribution:

A Discreet Distribution is a distribution constructed from a random variable produced by non-negative whole numbers. MEAN or AVERAGE is nothing but Expected Value of a Discreet Distribution.

The MEAN of any discreet distribution is calculated by taking the sum of product of values and their probability of occurrence.In the below example, number of Diabetics out  of 5 people chosen randomly and probability of number of Diabetics in the sample.



MEAN of this discreet distribution is calculated using inbuilt function SUMPRODUCT as shown below. SUMPRODUCT takes two arguments, ARRAY1 and ARRAY2.
                               
                                                "=SUMPRODUCT(C2:C8,D2:D8)"

 The mean or the expected value of the above discreet distribution is 1.14.







Share:

How to use wildcard characters in Excel formulas

Wildcard Characters:

Wildcard characters are useful when we have partial search criteria to extract or manipulate data. In Excel, we use "?" and "*" are wildcard characters. "?" represents single character, and "*" represents series of characters.


Wildcard characters are used with Excel functions, such as VLOOKUP, SUM, COUNT, SUMIF, COUNTIF and SUBSTITUTE among others. In the below Table, we have sales details of computer hardware items sold with their selling price and no. of items sold.

Product List:



We also have some scenario or conditions to extract data from the table to get desired output. Below is a picture of combination of Excel formulas and wildcard characters to get the desired result.

Scenario with Formula:







Share:

How and when to use TRIMMEAN function in Excel


TRIMMEAN is used to calculate Mean or Average of data points excluding the outliers or extreme values. Whenever we calculate Mean including outliers, We may get values that are not accurate and does not represent the fact. The reason for this is MEAN tend to get affected by extreme values. One big value in a data set can significantly change the value MEAN. In such cases, We use TRIMMEAN to exclude values from top and bottom tails of the data set.


  Syntax:     
                                          
     "=TRIMMEAN(Array,Percentage)"


TRIMMEAN has two arguments i.e Array and Percentage. Array is a data set for which we have to calculate TRIMMEAN. Percentage is decimal between 0 and 1. Percentage cannot be greater than or equal to 1. If the value of Percentage is 0 then TRIMMEAN = MEAN.

Example:

Consider the below example where MEAN and TRIMMEAN are calculated for 20 random data points. 



As the percentage argument for TRIMMEAN is 0, MEAN=TRIMMEAN which is 39.6.  Now change one of the values say 3rd number to 300. The value of Mean changes to 53. 




Now, change the percentage argument in TRIMMEAN to 0.2 (20%). The TRIMMEAN value changes to 40.69 from 53. In this case TRIMMEAN exclude the 4 values, 2 from top and 2 from bottom (20% of 20 data points) from data set and calculates the average. 





Note: For normally distributed data MEAN = MEDIAN. MEDIAN does not get affected with extreme values. Hence Percentage argument in TRIMMEAN function can be varied to adjust the value of TRIMMEAN approximately equal to MEDIAN. From the below picture it is clear that presence of extreme values in the data set does not affect the value of MEDIAN. 









                                    
Share:

How to use GESTEP Function in Excel

GESTEP Function:

GESTEP(NUMBER, STEP) Function checks two parameters, NUMBER and STEP value (numeric) and returns

                             1) 1   (if NUMBER >= STEP)
                             2) 0   (if NUMBER < STEP)

STEP parameter is optional and If no value is supplied, then default value of 0 would be used for STEP. If non-numeric values are supplied for either NUMBER or STEP or Both, then GESTEP function returns #VALUE! error.

Consider marks scored by candidates in competitive exam. GESTEP can be used to determine whether a candidate has scored above average or below average as shown below.







    Similarly, GESTEP can be used in combination with IF function where, logical test of IF function evaluates to TRUE if GESTEP returns 1 and FALSE if GESTEP returns 0.












Share:

Total Cycle Time calculation in Batch Production System

Total Cycle Time calculation in Batch Production System:

Consider a scenario in batch production (or processing) system where, a batch size of 100 jobs has to undergo three different processes namely P1, P2 and P3 with cycle time of 10, 17, 9 minutes respectively, before they are ready to be dispatched.

In this example, let us learn to calculate the total time required complete 100 jobs.

The total cycle time of all three process is 10 + 17 + 9 = 36 minutes
The largest cycle time is for P2 i.e. 17 minutes

Now, 1st Job starts at 1st minute and ends at 36th minute, Similarly 2nd Job starts at 11th Minute and ends at 53rd minute (ie P1 from 11th to 20th minute, P2 from 28th minute 44th and P3 from 45th to 53rd minute).

Formula for calculating time to complete Nth Job:

= TOTAL CYCLE TIME (36 Minutes) + (N-1) * Largest Cycle Time (17 Minutes)

ie for 1st Job, N=1

                                    = 36 + (1-1) * 17 = 36 Minutes

fro 2nd Job, N=2
                                   = 36 + (2-1) * 17 = 53 minutes


Below is an Excel template built based on above logic from Job 1 to 10. 





    from the above table, total time to complete 10 jobs is 189 minutes.




     
Share:

How to Determine and Interpret Correlation Co-efficient in Excel

Correlation Co-efficient is one of the important and useful tool in determining the relationship between two data sets (or two variable). Correlation co-efficient is denoted by "r" and its value ranges from -1 to +1.

Formula to calculate correlation co-efficient is



If the value of "r" is -VE, then two variables are inversely proportional (i.e. increase in the value of one variable results in decrease in the other).


If the value of "r" is +VE, then two variables are directly proportional (i.e. increase in the value of one variable results in increase in the other).


If the value of "r" is 0, then two variables are not correlated.

Consider this example of Age vs Income. In Excel, Correlation Co-efficient is determined using either CORREL or PEARSON formula.




From the above table it is evident that Age and Income are positively correlated with a value of "r" equal to 0.9638. 



Share:

Alternatives to QUOTIENT formula in Excel

Quotient formula returns integer portion of a division. Its syntax is "QUOTIENT(numerator, denominator)". Similarly INT and ROUNDDOWN formulas can be used as an alternative to QUOTIENT.

For example:

 119/15 = 7.93

In this example quotient is 7. Now, lets use above three formulas and check results

"=QUOTIENT(119,5)" = 7
"=INT(119/15) = 7
"=ROUNDDOWN(119/5,0)

The second argument in ROUNDDOWN is the number of digits to round down after decimal. As we want only the integer portion of division, it is set to zero.




Share:

How to reverse the order of cells in Excel?

In this post let us learn to get the order of adjacent cells reversed. Enter numbers from 1 to 10 in Column B from B2 to B11. Now, from C2 to C11 we have to get numbers from 10 back to 1. There are many ways to get this result. I used combination of OFFSET and COUNT formulas to get the output. 

In cell C2 enter "=OFFSET($B$2,COUNT(B2:$B$11)-1,0)" and drag the formula through C11.Here B2 and B11 are absolute refences and B2 of COUNT is relative reference.





Similar results are obtained by using "=INDEX($B$2:$B$11,COUNT(B2:$B$11),0)" formula. 
Share:

Squareroot, cuberoot, Nth root of a number in Excel

Excel has in-built function to find squareroot of a number i.e. SQRT(number), but to find cuberoot, 4th root, 5th root, We use "POWER" function.
For example: cube root of 4 is "=POWER(4,1/3)"
                       4th root of 4 is "=POWER(4,1/4)"
Share:

3 Ways to enter formulas in Excel

There are 3 ways in which you can enter formulas into an Excel cell. Select any cell and type one of the following sign and enter the name of the function and press "Enter".

1) = (equal sign)
2) +  (Plus sign)
3) @ (at sign)


For Example, Enter values 3, 4 and 5 in cells A1, A2 and A3 respectively. Now, type (without quotes)

"=SUM(A1:A3)"
"+SUM(A1:A3)"
"@SUM(A1:A3)"

in cells B1,B2 and B3.

Note: @ works only for built-in functions in Excel, but not for arithmetic operators using cell references (i.e. "@SUM(A1:A3)" works but not "@A1+A2+A3").
Share: