Text functions in Excel for data cleansing

 

If you are a regular user of Excel and deal with lot of text data, then mastering text functions in Excel is crucial. Excel has in-built text functions to help cleansing data for you. Below are some the text functions in excel helpful for quick data cleansing

 

1.CLEAN

CLEAN function clears any non-printable characters from the text. Non-printable characters are first 32 to characters of ASCII table. For example in cell A1, enter the formula “=CHAR(27)” this displays a character “” as shown below, which represents escape character and it is a non-printable character.





Now, Add a text to this cell using formula below so that the cell has combination of printable and non-printable characters.

    “=CHAR(27)&” Hello”


To clean cell A1, enter the formula “=CLEAN(A1)” in cell A2.


 


2TRIM

Trim is useful function in removing leading and trailing spaces from a text in the cell. But it doesn’t remove spaces between the words.

For example, A1 = TRIM(“     Hello”) prints Hello in cell without spaces

 

3. REPLACE/SUBSTITUTE

Both REPLACE and SUBSTITUTE functions are used to replace a specific segment of a text but has slightly different syntax. Enter a text “Hi There” in cell E4 and E5 and below are the examples of replacing the word “Hi” with “Hello” using REPLACE and SUBSTITUTE

REPLACE syntax “REPLACE(E3,1,2,"Hello")

SUBSTITUTE syntax SUBSTITUTE(E4,"Hi","Hello",1)

REPLACE uses start number, number of characters and new word to be replaced in a text as an argument, whereas SUBSTITUTE uses text to be replaces and new text and instance as arguments.

 



Share:

How to make any file read-only using VBA?



VBA has in-built function SetAttr to change attribute of a file or folder. This Function is classified as File or Directory function. Below is the syntax of SetAttr and it takes two parameters.

        Syntax: SetAttr (FilePath, Attribute)

FilePath is the fully qualified path of the file and Attribute is the parameter of a file which you wish to change. In this example we are going to change attribute of the file ‘Sample.xlsx’ from normal to read-only.

Example:

Sub ReadOnly()
Dim s as String

S = C:\LM10\Sample.xlsx
SetAttr (S, vbReadOnly)

End Sub



Share:

How to add line break and tab space in outlook using VBA?

When working on email automation in Outlook, the content of the mail can be of type Body or HTMLBody. Depending on the type we select we can use html tags or VBA constants to add Line Break or Tab Space in the body of the mail.


Line Break:

Assume MailObj is the Outlook mail object. To add line break between two texts in the body of the mail we can use either VBA constant (vbCrLf) or html tag (<br/>) as shown below.

     MailObj.Body = "This is 1st text" &  vbCrLf  &  "This is 2nd Text"

or

    MailObj.HTMLBody  = "This is 1st text"  &  "<br/>"  &  "This is 2nd Text"


Tab Space:

Tab space is equivalent to 8 blank spaces. Similar to the above example tab space cab be added to each text using a VBA constant (vbTab).


MailObj.Body  =  vbTab  &  "This is 1st text" &  vbCrLf  &  vbTab  &  "This is 2nd Text"

Share:

How to convert mailing list in Excel column into a string using Collections?

Let us assume you have say 20 email Ids in an Excel sheet to which a common mail communication is to be sent. One way of combining all the emails into a string is using CONCATENATE function. However it is difficult if the no. of email IDs to be combined are more. Another way to accomplish this is by using Collections in VBA. Collections are nothing but group of related objects.

First store all Email Ids into a Collection and then combine each email Ids in Collection into a String variable and this variable can be used as Mailing List to send mails. See the below example.

Below code will create a Collection and add mail Ids from Excel sheet to it. 

Dim MailList as Collection
Set MailList = New Collection

For i  = 1 to 20

MailList.Add Cells(i,1).Value

Next i

Now we have created collection, next step is to combine all mail Ids in collection into a single string.

Dim S as String

S = ""

For Each Item in MailList

S= S  & Item  & ";"

Next 

Now we have created single string 'S' of all 20 mail Ids. This variable can be used in outlook mail  application to send mails as shown below.

MailObj.To = S or
MailObj.CC = S





Share:

How to use curly braces in non-array formula in Excel


Curly Braces are generally used in array formulas in Excel. Array formulas are entered by pressing CONTROL+SHIFT+ENTER and Excel automatically inserts curly-braces in formula. Array formulas are useful when one has to get results from multiple set of values into a single cell or range of values.

Array Formula:

For example, Consider Range A1 through A3 has text "No", "Yes" and "None". If we have to get the value of maximum length of string in cell B1, then, This can be done by Array Entering (CONTROL+SHIFT+ENTER) the below formula in cell B1.

                                                       {=MAX(LEN(A1:A3")}

which returns a value 4.

Non-Array formula with curly-braces{}

However curly-braces can be used in non-array formulas by manually inserting it. Below is an example of using curly-braces in "VLOOKUP" without array entering the formula.

                           =VLOOKUP("C",{"A",1;"B",2;"C",3},2,0)

which returns a value 3.
:-

Share:

How to remove comments only from filtered cells in Excel

Consider a column which has number one to five. Cell with odd number has a comment "ODD" and even number as "EVEN".




If you have to delete comments only from even number then, 

- Filter the even numbers in column (i.e. 2 and 4)
- Then, hit the keystroke (Alt + :) this will highlight only the filtered cells
- Then, right click and delete comment from the selected cells in context menu

To apply formats for filtered cells, select all filtered cells and use comments in paste special option to paste only the comments as shown below.


.






Share:

How to activate other Microsoft application using VBA

If you are working on a project that involves interaction with multiple MS Office application, then VBA has a method to invoke other office applications within Excel. For example,
Below code invokes or activates Word application.

Application.ActivateMicrosoftApp (xlMicrosoftWord)

Similarly, replace xlMicrosoftWord to xlMicrosoftMail to activate Outlook. So one can create keyboard shortcut to frequently used application and activate those applications without any manual effort.

However, to invoke other application not listed or supported by above method use Shell command. For example to start notepad application, use the code: Shell "NotePad.exe"
Share:

Static variable in VBA

A static variable is a local variable which retain its value even after the execution of procedure. Static variable can be declared by placing static  keyword before the declaration.

For example,

static username as string
static counter as integer

In below example static variable "m" retain its value everyime a program is executed

Sub StVar()

Static m As Integer
m = m + 1
MsgBox (m)

End Sub

Note: To reset a static variable to initial value close the workbook or hit the reset button in VBA editor.
Share:

The Difference between Cumulative Distribution Function (CDF) and Probability Density Function (PDF)

Cumulative Distribution Function (CDF) vs Probability Distribution Function (PDF)

The Cumulative Distribution Function (CDF) of a random variable 'X' is the probability that the variable value is less than or equal to 'X'. It is the cumulative of all possible values between two defined ranges.On the other hand, Probability Distribution Function (PDF) is the probability of random variable 'X' equal to certain value. In other words it is a derivative of CDF.

Let us understand this with the example of Normal Distribution data. Normal Distribution curve is a bell shaped curve and is symmetric about its mean. its value extends from -ve infinity to +ve infinity. The curve extends indefinitely in both direction.

Consider a Normally Distributed Data with Mean = 494 and SD = 100. Let us calculate the  probability of random variable X between its mean and 500. Excel has NORMAL.DIST() function which returns CDF and PDF for a Normal Distribution. It takes four arguments namely; X, MEAN, SD and CUMULATIVE. The 4th argument CUMULATIVE is Boolean and if set to TRUE then the function returns CDF and if set to FALSE, the function returns PDF.

CDF = "NORM.DIST(500,494,100,TRUE)" = 0.5239 or  52.39%
PDF = "NORM.DIST(500,494,100,FALSE)" = 0.00 or  0%


In the above calculation CDF of 52.39% is a probability of X is from -infinity to 500. To get the probability of X between MEAN (494) and 500, subtract 50% from CDF, Hence the probability of X between MEAN and 500 is 2.39%.

Share:

The difference between DAYS and DAYS360 formulas

DAYS360:

The DAYS360 function is used  to calculate number of days between two dates. But it is based on the assumption that each month in a year has 30 days. This method is adopted in some financial institutions for the calculation of interest and other accounting purposes.

DAYS():
However Excel 2016 version has DAYS() formula which calculates the number of calendar days between two dates.
Share:

How to rename files of a Folder using VBA


It is easy and convenient to rename files of a folder using VBA. This can be accomplished with few lines of code. To do this, First you need to have fully qualified path with Names of the files and the new Names of those file to which you want to rename it to. Organize this data into two different columns as shown below.

















Now, copy paste the below VBA code into VBEditor and run the program. The file names of files 1,2,3,4 and 5 gets converted into A, B, C D and E respectively.

Sub FileRename()
Application.ScreenUpdating = False
Application.DisplayAlerts = False

Dim n As String
Dim m As String
Dim RowCount As Integer

Sheets("Sheet1").Select
RowCount = Application.WorksheetFunction.CountA(Range("A:A"))

For i = 2 To RowCount
 n = Cells(i, 1)
 m = Cells(i, 2)
 Workbooks.Open (n)
 ActiveWorkbook.SaveAs (m)
 ActiveWorkbook.Close
 Kill (n)
 ThisWorkbook.Save
Next
End Sub


Here is a video demonstration to the above Example.


Use Coupon Code 580EDUNF83 to get additional 20% discount








Share:

How to create and use Dynamic Arrays in VBA


Arrays are important elements of any programming language. An array can be one, two or multi dimensional. An Array declared without a specific size (re-sizable) is called a Dynamic Array. Dynamic Array can be declared as shown below.

Dim NewArray () as Integer

However, to use the Array in application, We must define its size. To change size of an Array we use ReDim keyword.

ReDim NewArray (4) 


Now, NewArray can store up to 5 values (i.e. 0 to 4). Let us assign 5 values to the Array.

NewArray = Array (5,10,15,20,25)

ReDim can be used multiple times in our application to change the size of an Array dynamically. Suppose we want to change the size of NewArray again, so that it can store 10 values, then use ReDim again as below.

ReDim NewArray (9)

But changing the dimension of Array using ReDim statement clears the previously stored values. If we want to retain the previously stored values, then we use Preserve keyword as shown below. By doing this old values stored in an array remain unchanged.

ReDim Preseve NewArray (9)





Share:

How to repeat the Header row in each page using VBA?


Consider an Excel document which has employee details with 5000 employees. The 1st row of the sheet has details, such as Name, Age, Designation, Date of Birth and Date of Joining of employees. 

Now, If we want to take print of these employee list, and if we want the header row to appear in each page of the print out then, select the sheet which has the data and use the below VBA code to make header row appear on each page.

Share:

How to enter formula with relative and absolute reference using VBA

Relative vs Absolute:

To enter a formula using VBA into a single cell, We use Range().Formula property and assign it to worksheet function.

For Example, VBA code to enter VLOOKUP formula in cell B1 is
                    
                         RANGE("B1").FORMULA = "=VLOOKUP(A1,SHEET2!A:B,2,0)"


Suppose, We want to apply the same formula from Range B1 through B10, then change the VBA
 code to
                         RANGE("B1:B10").FORMULA = "=VLOOKUP(A1,SHEET2!A:B,2,0)"


Now, in the new code Range A1 is fixed or absolute for cells from B1 through B10. To change it to relative reference, replace A1 with A:A on the RHS of the code. Now, the new code look like

                        RANGE("B1:B10").FORMULA = "=VLOOKUP(A:A,SHEET2!A:B,2,0)"

Replacing A1 with A:A changes the lookup value from Absolute to Relative reference (i.e. for cell B1, A:A is  A1; for B2, A:A is A2 and so on..).

NOTE:


Use Coupon Code 580EDUNF83 to get additional 20% discount

This relative referencing style doesnot work for EOMONTH worksheet function. The syntax for EOMONTH is EOMONTH(START DATE, MONTHS).

                 Let RANGE("B1:B10").FORMULA =  "=EOMONTH(A1,0)"

In the above formula, If we replace A1 with A:A then it will return #VALUE error. An alternative to get the relative reference to such worksheet functions is replacing A1 with the below formula                                                        INDIRECT(ADDRESS(ROW(),1,4)). 

So the correct code would be

      RANGE("B1:B10").FORMULA =  "=EOMONTH(INDIRECT(ADDRESS(ROW(),1,4)),0)"

Share:

Excel Shortcuts to convert Number into different formats

Excel Shortcuts:

Keyboard shortcuts in Excel allow you to reduce the time spent doing routine task using mouse and thereby boosting productivity.

Below is a list of Excel shortcut keys to convert number into different Time and Date formats.

1) To Convert Number to Date format: Select the range of cells that contain numbers and press             CTRL+SHIFT+3

2) To Convert Number to Time format: Select the range of cells that contain numbers and press            CTRL+SHIFT+2

3) To Convert Number to Percentage format: Select the range of cells that contain numbers and           press CTRL+SHIFT+5

4) To Convert Number to Currency format: Select the range of cells that contain numbers and             press CTRL+SHIFT+4



Share:

CAGR calculation in Excel

CAGR:

CAGR - Compounded Annual Growth Rate is a measure of growth over a period of time. It is usually calculated for number of years. Businesses use CAGR to measure the growth in their revenue over a period of time. In finance CAGR is used as a measure that indicates appreciation in the value of an Investment over a period of time.

Example:

Suppose an initial investment of $10000 is made and its value appreciates to $17000 after 5 years. CAGR for this investment is calculated using the below formula.


                         



                                                          where n = no. of years





The CAGR for the above investment example is 0.11196 or 11.196%. Here, CAGR is equivalent to constant growth rate per year. (i.e. Constant growth of 11.196% for 5 consecutive years.

You can validate the CAGR by calculating Final Value using the initial value of $10000 and CAGR of 11.196%.

                                Final Value = $10000 * (1+0.11196)^5 = $17000





Share:

How to convert time format in to Minutes in Excel

How to convert time format in to Minutes in Excel:

Consider an example where Start Time, End Time and Total Time Taken for a set of activities are stored in a standard Excel Time format (i.e. hh:mm:ss). Here TOTAL TIME = END DATE - START DATE. In column F, let us calculate total time taken in Minutes.




  Now, to convert Excel time formats into Minutes, We have to multiply the values in each of columnE with 1440. So enter the formula "=E2*1440" in cell F2 to convert 1:00:00 into minutes.






Excel stores standard time formats as DAY fraction. So the value equivalent of 1:00:00 is 0.041667 days. (i.e. 1/24). Therefore 0.041667 * 1440 = 60 Minutes.










Share:

How to use SUMPRODUCT to calculate mean of Discreet Distribution

Discreet Distribution:

A Discreet Distribution is a distribution constructed from a random variable produced by non-negative whole numbers. MEAN or AVERAGE is nothing but Expected Value of a Discreet Distribution.

The MEAN of any discreet distribution is calculated by taking the sum of product of values and their probability of occurrence.In the below example, number of Diabetics out  of 5 people chosen randomly and probability of number of Diabetics in the sample.



MEAN of this discreet distribution is calculated using inbuilt function SUMPRODUCT as shown below. SUMPRODUCT takes two arguments, ARRAY1 and ARRAY2.
                               
                                                "=SUMPRODUCT(C2:C8,D2:D8)"

 The mean or the expected value of the above discreet distribution is 1.14.







Share:

How to get only Integer portion of a division using VBA operator

Excel has a built in function QUOTIENT() to return integer portion of a division. The similar result can be obtained using ROUND() function. However, VBA has operator to return only integer portion of a division. The symbol for getting division with integer portion as result is "\".

Below is an an example of two procedures with "division" (/) and "division with integer" (\) for two variables.



The first procedure returns a value of 5.5 and the second one returns 5. However, this division with integer operator does not work when used as formula in cells.



Share:

How to make Excel UserForm Immovable using VBA

By default excel userforms are movable. This will let the user to view the Excel sheet by dragging the user form aside. In case if the the creator of the user form does not want the user to view Sheets, then UserForm position can be fixed using a VBA code as given below.


                                  Private Sub UserForm_Layout()
                                  UserForm1.Left = 0
                                  UserForm1.Top = 0
                                  End Sub




Here is a link to video demonstration of this.





Share: