Monday, July 29, 2013

Excel Automation Using vbs

This post share how to do Excel automation through the use of vbs. Please take note that this might not be the most optimum way to get things done and you are always welcome to suggest new way by comment on this post.

Initialized Excel application object

This are the few lines you must include in your script whenever you need to do Excel automation.
' Create the excel object
Set objExcel = CreateObject("Excel.Application")

' Do not show the excel workbook 
objExcel.Visible = False

Open workbook to process 

You can use this for csv file also
Set objWorkbook = objExcel.Workbooks.Open (filePath)

Create an object to represent your first worksheet


Set objWriteSheet = objWorkbook.Worksheets(1)

To find the last row or last column in the worksheet

 These 2 lines help to find the last column or last row in your worksheet which has data occupied.
numRows = objWriteSheet.UsedRange.Rows.Count
numCols = objWriteSheet.UsedRange.Columns.Count

To delete column 

These loop will loop through all the column from right to left and it will look for column with value of "test4" or "test5" and delete the column. Please note that it is always better to loop from right to left (column 7->column 6->column 5 .... column 1) as after you delete the column, the number of column will change. You can do the same for delete row by replacing EntireColumn with EntireRow.
For i = numCols to 1 step -1
 if InStr(objWriteSheet.Cells(1,i).value, "test4") Or InStr(objWriteSheet.Cells(1,i).value, "test5") Then
  objWriteSheet.Cells(1,i).EntireColumn.Delete
 End If
Next

To assign value to a cell

 You need to get a worksheet object and just enter the correct row and column index for the correct cell.
objWriteSheet3.Cells(1,14).value = "test" 

To use Excel formula 

The loop below loop through all the rows (starting from row 2)in the worksheet. By creating an object for the range (objRange), you can use this objRange as an argument for this statement - objExcel.WorksheetFunction.Sum(objRange). You can use other formula such as Average...
For i = 2 to numRows
 Set objRange = objWriteSheet3.Range(objWriteSheet3.Cells(i,2), objWriteSheet3.Cells(i,13))
 objWriteSheet3.Cells(i,14).value = objExcel.WorksheetFunction.Sum(objRange)
 Set objRange = Nothing
Next

Copy 2 different columns and paste it to another worksheet 

The lines below define 2 range object for different column (column with index 14 and 28) and use the union to combine them so that the copy can be done for 2 different ranges.
' Copy the result to the new worksheet
Set objRangeD = objWriteSheet3.Range(objWriteSheet3.Cells(1,14), objWriteSheet3.Cells(numRows,14))
Set objRangeE = objWriteSheet3.Range(objWriteSheet3.Cells(1,28), objWriteSheet3.Cells(numRows,28))
 
objExcel.Union(objRangeD,objRangeE).copy
objWriteSheet2.Range("D1").PasteSpecial 

Create chart

First line create a chart object with the following arguments (left, top, width, height).
Second line define this as line chart.
Third line set the data source where you need to define range for it.
The most critical thing here is vbs actually don't understand xlLine. So to make this works, you have to define a constant for this Const xlLine = 4. This link let you have the number for all the different constant use in Excel application.
Set objMychart = objWriteSheet2.ChartObjects.Add(400, 20, 375, 200).Chart
objMychart.ChartType = xlLine
objMychart.SetSourceData objWriteSheet2.Range(objWriteSheet2.Cells(1,1), objWriteSheet2.Cells(numRows,numCols))

Add chart title and axis title 

Again, you need to define your own constant for xlCategory and xlValue. Of course, you can also put number directly.
objMyChart.hasTitle = True
objMychart.ChartTitle.Text = "Sales"

objMychart.Axes(xlCategory).hasTitle = True
objMychart.Axes(xlCategory).AxisTitle.Text = "Month"

objMychart.Axes(xlValue).hasTitle = True
objMychart.Axes(xlValue).AxisTitle.Text = "$"

Set the maximum scale for y axis


objMychart.Axes(xlValue).MaximumScale = 100

Set the line colour 

These line set the first line in the chart, you can change to indicate which line you want to change.
objMychart.SeriesCollection(1).Format.Line.Visible = True
objMychart.SeriesCollection(1).Format.Line.ForeColor.RGB = RGB(0,32,96)
objMychart.SeriesCollection(1).Format.Line.Transparency = 0

Save your Excel document

Very important step!
objWorkbook.SaveAs (D:\result.xls), -4143

objExcel.Quit

set objExcel = Nothing

Sunday, July 7, 2013

Hadoop Rack Awareness (1.0.4)

To enable Hadoop rack awareness, you need to create a script that do the mapping and specify the path to this script using the topology.script.file.name property in core-site.xml file.

Below is the script i use which i actually obtain from this site. The script name is topology.sh
#!/bin/sh
HADOOP_CONF=/etc/hadoop
while [ $# -gt 0 ] ; do
  nodeArg=$1
  exec< ${HADOOP_CONF}/topology.data
  result=""
  while read line ; do
    ar=( $line )
    if [ "${ar[0]}" = "$nodeArg" ] ; then
      result="${ar[1]}"
    fi
  done
  shift
  if [ -z "$result" ] ; then
    echo -n "/default-rack "
  else
    echo -n "$result "
  fi
done

And the topology.data file is as shown below
10.0.0.11  /rack1
10.0.0.12  /rack1
10.0.0.13  /rack1
10.0.0.14  /rack1
10.0.0.15  /rack2
10.0.0.16  /rack2
10.0.0.17  /rack2
10.0.0.18  /rack2
10.0.0.19  /rack3
10.0.0.20  /rack3
10.0.0.21  /rack3
10.0.0.22  /rack3

Place these 2 files on /etc/hadoop folder on your namenode only (you can of course specify other directory but make sure you change the path information in the script file and core-site.xml file). After you done with this, you can proceed to add the topology.script.file.name property in the core-site.xml file. You only need to do this on namenode.

Once you done, you can restart Hadoop. (execute stop-all.sh follow by start-all.sh) To validate your cluster is indeed rack awareness, use this command: hadoop dfsadmin -report. You should be able to see the extra line showing Rack information for each datanode.
[hadoopuser@hadoop-name-node hadoop]$ hadoop dfsadmin -report
Configured Capacity: 5143534043136 (4.68 TB)
Present Capacity: 4881124573184 (4.44 TB)
DFS Remaining: 2865411211264 (2.61 TB)
DFS Used: 2015713361920 (1.83 TB)
DFS Used%: 41.3%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 12 (12 total, 0 dead)

Name: 10.0.0.12:50010
Rack: /rack1
Decommission Status : Normal
Configured Capacity: 428627836928 (379.19 GB)
DFS Used: 188051693553 (155.14 GB)
Non DFS Used: 21867446287 (40.37 GB)
DFS Remaining: 218708697088(183.69 GB)
DFS Used%: 43.87%
DFS Remaining%: 51.03%
Last contact: Mon Jul 08 09:44:33 SGT 2013


Name: 10.0.0.26:50010
Rack: /rack3
Decommission Status : Normal
Configured Capacity: 428627836928 (379.19 GB)
DFS Used: 166991044608 (135.52 GB)
Non DFS Used: 21867454464 (40.37 GB)
DFS Remaining: 239769337856(203.3 GB)
DFS Used%: 38.96%
DFS Remaining%: 55.94%
Last contact: Mon Jul 08 09:44:33 SGT 2013