Monday, November 11, 2013

I have made this blog based on my experience on Isilon storage and solutions provide in various forums. pls use on you own risk. not responsible for any loss of data. thanks for looking



Isilon Performance Stats
Summary
isi statistics drive --nodes=all --orderby=busy --type=sas,sata --top 
or
isi statistics drive --nodes=all --orderby=busy --type=sas,sata | head -n 30
Other Useful Live Monitoring
#isi statistics system --nodes --top
#isi statistics client --orderby=Ops --top
#isi statistics heat --top
#isi statistics pstat
# isi stat -d
# isi_for_array -s 'isi_hw_status -i'
# uname -a

# isi pkg info

Cluster Performance Snapshot
isi statistics pstat
List files in use
isi statistics heat --nodes=all --orderby=ops --top
List of client connections
isi statistics client --nodes=all --orderby=ops –top
Isilon Performance issue
The WebGUI is ok, but IMO to slow for live monitoring.
On the command line interface (CLI):
isi nfs clients ls
isi perfstat
Check load balancing across all nodes in cluster
Sessions per node
isi_for_array “isi smb session list |grep -i computer |wc -l”
How many open files on each node
isi_for_array “isi smb file list |grep -i path |wc -l”
Drive utilization for specific chassis
isi statistics drive –nodes=2 –top
Drive utilization for current chassis
isi statistics drive –top
Cluster wide statistics
isi statistics pstat –top
IOPS for Cluster
isi statistics query –nodes=all –stats=node.disk.xfers.rate.sum –top
Drive Queue
isi statistics drive --nodes=all --orderby=queued --type=sas,sata –top
Disk IOPS per Chassis/Drive
Edit for you drive number per chassis. This is based on 36 drives
for i in {0..35};do isi statistics query –nodes=all –stats=node.disk.xfers.rate.$i;done$i;done


For your original test, you might max out with the disk IOPS (xfers),  but you could also get stuck at a certain rate of your "application's IOPS " while seeing few or no disk activity at all(!) -- because your data is mostly or entirely in the OneFS cache . Check the "disk IOPS" or xfers, including ave size per xfer, with
 isi statistics drive -nall -t --long --orderby=OpsOut
 and cache hit rates for data (level 1 & 2) with:
 isi_cache_stats -v 2
isi statistics drive -nall --orderby=Inodes --long --top

the latter shows (in very verbose form, but not so easy to count
the number of disks used) the actual layout of the file on the cluster disks.
Usually "streaming" access files should spread onto more disks,
but on small (or fragmented?) clusters the difference between
streaming/random/concurrency might appear minimal.

isi set -l {concurrency|streaming|random} -r g retune "filename"

will actually change the layout if needed. (I trust this more
that the WebUI). Even if finished, it might take some
more seconds until changes show up with isi get -DD
In case of very effective caching the IOPS will NOT be limited by disk transfers (so all that filesystem block size reasoning doesn't apply).
Instead the limit is imposed by CPU usage, or network bandwidth, or by protocol (network + execution) latency even
if CPU or bandwidth < 100%.
In the latter case, doing more requests in parallel should be possible (it seems you are right on that track anyway with multiple jobs).

To check protocol latencies, use "isi statistics client" as before and add --long:

isi statistics client --orderby=Ops --top --long
 This will show latency times as:  TimeMax    TimeMin    TimeAvg   (also useful for --orderby=... !)
Maybe a few things can be checked in advance (before tracking things down to disk level):
- double check that no background jobs are running and stealing CPU or IOPS
- with four clients, is the network traffic well balanced across the four Isilon nodes?
- are the actual NFS read/write sizes large enough for 128K? (server and client negotiate a match within their limits.)
- is the random access pattern really in effect?
- for 128K reads, one could also try the concurrency pattern...

NFS number of threads: This is the number of NFS server daemon threads that are started when the system boots. The OneFS NFS server usually has 16 threads as its default setting; this value can be changed via the Command Line Interface (CLI):
isi_sysctl_cluster sysctl vfs.nfsrv.rpc.[minthreads,maxthreads]
Increasing the number of NFS daemon threads improves response minimally; the maximum number of NFS threads needs to be limited to 64.

I think that's 64 per node (isi_sysctl_cluster just spreads the setting to all nodes.)
And wether 64 Isilon threads do better or worse than 256 "brand X" threads
is up to the implementations; you might need to do tests.

It seems that isi_cache_stats -v prints totals since startup,
and it is even more useful when monitoring live deltas at
regular intervals like 5s: isi_cache_stats -v 5

BEWARNED, what follows isn't something that I recommend for a non-test situation. You can flush all CACHE (read) from a node or all nodes using

   isi_flush
   or isi_for_array -s isi_flush

this will happily flush all your cache warmth for your work-flow. USE WITH care, you will impact the cache performance benefit realtime from all your active work-flow clients.

The isi_cache_stats tool is a wrapper to the sysctl isi.cache.stats. As you indicated the data is collected from cluster uptime. The first row returned is typically the global amount since uptime or data reset.

You can run isi_cache_stats -z then isi_cache_stats 5. This will clear the global stats and then start to monitor the number of realtime blocks that are started and from which you gain benefit from.

The isi_cache_stats in the non -v case are just a summary of what you see in the -v. The only real difference is that it shows you BLOCKS as human readable rather than blocks.

BTW: Another lightweight means to look at cluster wide work-flow as you look at isi_cache_stats is

   isi perfstat


A means to measure your buffered writes is simply to measure the latency seen for protocol write operations

Isi statistics protocol --class=write --orderby=timeavg --top

In 7x onefs you should note very optimal writes in the microsecond range. When this climbs to the millisecond range, the two simple reasons would be

1) the journal cannot flush writes to disk based on rate of change. This is another way of saying that there are insufficient disks in the node pool to satisfy the demand.

Isi statistics drive -nall --orderby=timeinq --long --top


You might note that the sum of Opsin (writes) + opsout (read) exceeds a normal range for disk type. You would see > 1 queued io . The more queued the more significant it would be to look to increasing spindle count. Adding nodes almost immediately brings new disks into the fold.

maintenance commands
isi_gather_info  # collect status of cluster and send to support (usually auto upload via ftp)
HD Replacement
isi devices     # list all devices of the node logged in
isi devices -a status -d 14:bay28 # see statys of node 14, drive 28
isi devices -a add    -d 14:28  # add the drive (after being replaced)
isi devices -a format -d 14:28  # often need to format the drive for OneFS use first
     # it seems that after format it will automatically use drive (no ADD needed)
# other actions are avail, eg smartfail a drive.
isi_for_array -s 'isi devices | grep -v HEALTHY' # list all problematic dev across all nodes of the cluster.
isi statistics drive --long  # 6.5 cmd to see utilization of a hd.
user mapper stuff
id username
id windowsDomain\\windowsUser
    # Note that, username maybe case sensitive!!

isi auth ads users  list --uid=50034
isi auth ads users  list --sid=S-1-5-21-1202660629-813497703-682003330-518282
isi auth ads groups list --gid=10002
isi auth ads groups list --sid=S-1-5-21-1202660629-813497703-682003330-377106
isi auth ads user list -n=ntdom\\username
# find out Unix UID mapping to Windows SID mapping:
# OneFS 6.5 has new commands vs 6.0
isi auth mapping list  --source=UID:7868
isi auth mapping rm    --source=UID:1000014
isi auth mapping flush --source=UID:1000014   # this clear the cache
isi auth mapping flush --all
isi auth local user list -n="ntdom\username" -v # list isilon local mapping

isi auth mapping delete --source-sid=S-1-5-21-1202660629-813497703-682003330-518282 --target-uid=1000014 --2way
 # should delete the sid to uid mapping, both ways.
isi auth mapping delete --target-sid=S-1-5-21-1202660629-813497703-682003330-518282 --source-uid=1000014
 # may repeat this if mapping not deleted.

isi auth mapping dump | grep S-1-5-21-1202660629-813497703-682003330-518282
isi auth ads group list --name
isi auth local users delete --name=ntdom\\username --force

rcf2307 is prefered auth mechanism... windows ad w/ services for unix.

isi smb permission list --sharename=my_share


    # find out Unix UID mapping to Windows SID mapping:

    isi auth ads users map list --uid=7868
    isi auth ads users map list --sid=S-1-5-21-1202660629-813497703-682003330-305726
    isi auth ads users map delete --uid=10020
    isi auth ads users map delete --uid=10021
    isi_for_array -s 'lw-ad-cache --delete-all'  # update the cache on all cluster node
    # windows client need to unmap and remap drive for new UID to be looked up.

    # for OneFS 6.0.x only (not 6.5.x as it has new CIFS backend and also stopped using likewise)
    # this was lookup uid to gid map.
   
    sqlite3 /ifs/.ifsvar/likewise/db/idmap.db 'select sid,id from idmap_table where type=1;' # list user  sid to uid map
    sqlite3 /ifs/.ifsvar/likewise/db/idmap.db 'select sid,id from idmap_table where type=2;' # list group sid to gid map
    1:  The DB that you are looking at only has the fields that you are seeing listed. 
    With the current output it will give you the SID and UID of the users mapped. 
    With these commands you can find the username that is mapped to that information:
    #isi auth ads users list --uid={uid}
    or
    #isi auth ads users list --sid={sid}

    2:  The entries in the DB are made as the users authenticate to the cluster. 
    So when a client tries to access the share, the client sends over the SID,
    we check the DB and if no entry is found, we check with NIS/LDAP,
    if nothing is found there, we generate our own ID (10000 range) and add it to the DB. 
    Any subsequent access from that SID will be mapped to the UID in that DB.

    3:  You can run the following to get the groups and the same rules
    apply for the GID and SID lookups:
    #sqlite3 /ifs/.ifsvar/likewise/db/idmap.db 'select sid,id from idmap_table where type=2;'
    #isi auth ads groups list --gid={gid}
    #isi auth ads groups list --sid={sid}

    4:  You can delete the entries in the database,
    but the current permissions on files will remain the same. 
    So when the user re-accesses the cluster he will go through the
    process outlined in question 1.
Snapshot

Snapshots take up space reported as usable space on the fs. 
cd .snapshot
 
Admin can manually delete snapshot, or take snapshot of a specific directory tree instead of the whole OneFS.
 


CIFS
ACL
ls -led   # show ACL for the current dir (or file if filename given)
ls -l   # regular unix ls, but + after the permission bits indicate presence of CIFS ACL
setfacl -b filename # remove all ACL for the file, turning it back to unix permission
chmod +a user DOMAIN\\username  allow generic_all /ifs/path/to/file.txt  # place NTFS ACL on file, granting user full access

ls -lR | grep -e "+" -e "/" | grep -B 1 "+"    # recursively list files with NTFS ACL, short version
ls -lR | grep -e "^.......... +" -e "/"  | grep -B 1 "^.......... +" # morse code version, works better if there are files w/ + in the name
Time Sync
isi_for_array -s 'isi auth ads dc' # check which Domain Controller each node is using
isi_for_array -s 'isi auth ads dc --set-dc=MyDomainController # set DC across all nodes
isi_for_array -s 'isi auth ads time'  # check clock on each node
isi auth ads time --sync   # force cluster to sync time w/ DC (all nodes)
isi auth ads status   # check join status to AD
killall  lsassd    # reset daemon, auth off for ~30sec, should resolve offline AD problems

"unix" config
Syslog
isi_log_server add SYSLOG_SVR_IP [FILTER]
-or-
vi /etc/mcp/templates/syslog.conf
isi_for_array -sq 'killall -HUP syslogd'
Disable user ssh login to isilon node
For Isilon OneFS 6.0: 
vi /etc/mcp/templates/sshd_config
add line
AllowUsers root@*
Then copy this template to all the nodes:
cp /etc/mcp/templates/sshd_config /ifs/ssh_config
isi_for_array 'cp /ifs/ssh_config /etc/mcp/templates/sshd_config
One may need to restart sshd, but in my experience sshd pick up this new template in less than a minute and users will be prevented from logging in via ssh. 
In OneFS 6.5, maybe the template will be replicated to all nodes? Or maybe that's only for syslogd, but not sshd, as they are concerned it may lock user out from all the nodes from ssh access...

Isilon Shares

isi smb shares create SHARENAME /ifs/*
isi smb shares permission create SHARENAME --group "domain\domain users" --permission-type allow --permission full --zone system
isi smb shares permission create SHARENAME --group "domain\domain admins" --permission-type allow --permission full --zone system
isi smb shares permission delete SHARENAME --wellknown everyone –force
SPN’s
# isi auth ads spn create --user=<Administrator> --spn=cifs/<cluster.domain.local>
# isi auth ads spn create --user=<Administrator> --spn=host/<cluster.domain.local>
UNIX Shares

# isi nfs exports create --rwclient=x.x.x.x --rootclient=x.x.x.x --path=/ifs/data/test
QUOTA Creation CLI
cluster-1# isi quota quotas create --path /ifs/XXX/TEST/Admin --type directory --hard-threshold 70G --soft-threshold 60G --soft-grace 7D --advisory-threshold 50G --container yes --include-snapshots no
Cluster-1# isi quota quotas create --path /ifs/XXX/TEST/Acct --type directory --hard-threshold 210G --soft-threshold 200G --soft-grace 7D --advisory-threshold 190G --container yes --include-snapshots no


Create share permissions using cli

share permissions are often confused with NTFS Security permissions. The
Share Permissions are your first security gate, once a user passes that
gate, he is faced with the next security gate and that is the ACL.

Lets take an example:


A share testshare1 was created. While creating the share the storage admin
selected "Do not Change Existing Permissions" option. He applied Domain
Admins => Full Control and Finance Group Full Control Share Permissions.
Later he created an ACL from the OneFS command line as follows.

*chmod +a group "Paddy\finance" allow
dir_gen_all,container_inherit,object_inherit testshare1*

*This is how it looks from the OneFS CLI:*

*# ls -lend /ifs/data/testshare1*

drwxrwxr-x + 2 root wheel 23 Oct 29 17:47 /ifs/data/testshare1

OWNER: user:root

GROUP: group:wheel

CONTROL:dacl_auto_inherited,dacl_protected

0: everyone allow dir_gen_read,dir_gen_execute

1: user:root allow
dir_gen_read,dir_gen_write,dir_gen_execute,std_write_dac,delete_child

2: group:wheel allow dir_gen_read,dir_gen_execute

3: group: paddy \finance allow dir_gen_all,object_inherit,container_inherit

This allows only the Finance group to write to the directory
/ifs/data/testshare1.

When any user from Domain Admins tries to access the share they may be able
to access it but although they are a powerful domain admins user with full
control on the share, they may still *not* be able to write to that share
or modify the ACLs (security tab in explorer) on that share, because the
ACLs prevent them from doing so.

Whereas any user from the finance group is able to access the share and
modify data and ACLs on that directory.

In your case, review the ACLs on the directory being shared and see if the
group or user you are trying to access has permissions.

Now, lets say Domain Admins wants to modify the ACLs, you can modify the
share permissions for "Domain Admins" Temporarily to run-as-root and you
will be able to modify the NTFS Security Permissions on that directory.
Hopefully you have obtained permissions from the fictitious Finance Group
to do so :)

Finding large files
fstat is a bit like lsof in the Linux world, but exists on FreeBSD:
1
fstat | sort -k 8 -n -r | more
Finding serial number
1
2
isi config
quit
gathering and uploading info, usually required for a support call
1
isi_info_gather
Show status/alert info
1
2
isi status
isi alerts
Do something on all cluster nodes
1
isi_for_array 'df -h'



I had this problem last week. The /var filesystem was full, but contained few files. This in term seemed to kill cifs and the web interface, though nfs was fine.
Long story short, it’s probably snmpd, there’s a bug in a version of the isilon os (possibly fixed now).
You can use fstat to find abnormally large open files (unfortunately lsof isn’t present, so I couldn’t see a way to locate unlinked files) and the process that has them open. You can then kill -9 snmpd. After that you can restart services as follows:
isi services apache2 disable
Isi services apache2 enable
isi services cifs disable
isi services cifs enable
You may also need to kill off webui/smbd (killall -9 isi_webui_d).

Here are some some useful Isilon commands to assist you in troubleshooting Isilon storage array issues.
Grep the log for stalled drives on the isilon cluster
     cat /var/log/messages |grep -o 'stalled: [0-9,*:]*'|sort |uniq -c
(Stalled drives are bad, and can cause cluster problems. you could also run this command on the individual nodes /var/log/restripe.log )
Grep the log for stalled drives on the isilon cluster for month of Sept
grep 'Nov ' /var/log/messages |grep -o 'stalled: [0-9,*:]*'|sort |uniq -c
Use this on the restripe.log
  grep 'Nov ' /var/log/restripe.log |grep -o 'Stalled drives are \[[0-9,*:]*\]'|sort |uniq -c
When reviewing the results of the stalled drives it is important to note that the drive numbers listed is the logical drive number and not the bay number.  You need to run the command “isi devices” on the node with the suspect drive to determine what bay the drive is actually in.
Display the SMART error log of all the drives on a given isilon node:
isi_radish -a|less
Display the current isilon Flexprotect Policy
isi get /ifs
Display the current isilon node hardware status:
isi_hw_status
Display the status of the isilon node network config
isi config
then while in the config utility
 status 
Display this list of alerts in wide format
 isi alerts -w
Start/Stop/Resume/Pause Restriper jobs
 isi restripe pause
isi restripe start
isi restripe stop
 isi restripe resume -i
Display the drive status of a given isilon node
     #for node 3
     isi devices -d 3  
Display the SAS drives Physical Monitoring stats for errors
     less /var/log/isi_sasphymon.acc
Test Active Directory connections from all isilon nodes
     isi_for_array wbinfo -t
To find an open file on Isilon Windows share
     isi_for_array -q -s smbstatus | grep
  then find the PID from the results and  then run this to get the user
     isi_for_array -q -s smbstatus -u| grep    to get the user
Note: The isi_for_array command runs the command on all of the nodes. This command will ask for the user’s password so that it can login to the other nodes and complete the command. When passing the results of a “isi_for_array” command to another command such as grep (like the example above) will require the user password so that it can be passed to the other nodes. There is no prompt for the password so you must enter it on the next line and press enter to get the results of the command.
To Fail the Disk on the node proactively
isi devices -a smartfail -d 11:bay4
To Gather Logs on all Nodes
isi_gather_info -f /var/crash
To see what’s taking up space in the /var/crash partition, run the following command on any node in the cluster:
isi_for_array -qs ‘df -h’
isi_for_array -sq ‘find /var/crash -type f -size +10000 -exec ls -lh {} \;’
To check windows mappings on the isilon
isi auth mapping token –name=enterprise\\userid
once the syniq is done  follow the below procedure to make the target file systems as read and write
isi sync target break –policy=govindisi01_ifs_hybrid_gridlogs_TO_govindisi02_ifs_hybrid_gridlogs –force
isi sync target break –policy=govindisi01_ifs_hybrid_BRID_DATA_TO_govindisi02_ifs_hybrid_BRID_DATA –force


Install Package
zeus-1# isi pkg install patch-71234.tar
List Packages
zeus-1# isi pkg list 
Uninstall Package
zeus-1#  isi pkg delete patch-71234 


Commands

For manual pages, use an underscore (e.g., man isi_statistics). The command line is much more complete than the web interface but not completely documented. Isilon uses zsh with customized tab completion. When opening a new case include output from "uname -a" & "isi_hw_status -i", and run isi_gather_info.
isi_for_array -s: Execute a command on all nodes in in order.
isi_hw_status -i: Node model & serial number -- include this with every new case.
isi status: Node & job status. -n# for particular node, -q to skip job status, -d for SmartPool utilization; we use isi status -qd more often.
isi statistics pstat --top & isi statistics protocol --protocol=nfs --nodes=all --top --long --orderby=Ops
isi networks
isi alerts list -A -w: Review all alerts.
isi alerts cancel all: Clear existing alerts, including the throttled critical errors message. Better than the '''Quiet''' command, which can suppress future errors as well.
isi networks --sc-rebalance-all: Redistribute SmartConnect IPs to rebalance load. Not suitable for clusters with CIFS shares.
du -A: Size, excluding protection overhead, from an Isilon node.
du --apparent-size: Size, excluding protection overhead, from a Linux client.
isi devices: List disks with serial numbers.
isi snapshot list --schedule
isi snapshot usage | grep -v '0.0'
isi quota list --show-with-no-overhead | isi quota list --show-with-overhead | isi quota list --recurse-path=/ifs/nl --directory
isi quota modify --directory --path=/ifs/nl --reset-notify-state
isi job pause MultiScan / isi job resume MultiScan
isi job config --path jobs.types.filescan.enabled=False: Disable MultiScan.
isi_change_list (unsupported): List changes between snapshots.
sysctl -n hw.physmem: Check RAM.
isi device -a smartfail -d 1:bay6 / isi devices -a stopfail -d 1:bay6 (stopfail is not normally appropriate)
isi devices -a add -d 12:10: Use new disk in node 12, bay 10.
date; i=0; while [ $i -lt 36 ]; do isi statistics query --nodes=1-4 --stats=node.disk.xfers.rate.$i; i=$[$i+1]; done # Report disk IOPS(?) for all disks in nodes 1-4 -- 85-120 is apparently normal for SATA drives.
isi networks modify pool --name *$NETWORK*:*$POOL* --sc-suspend-node *$NODE*: Prevent $POOL from offering $NODE for new connections, without interfering with active connections. --sc-resume-node to undo.
isi_lcd_d restart: Reset LEDs.
isi smb config global modify --access-based-share-enum=true: Restrict SMB shares to authorized users (global version); isi smb config global list | grep access-based: verify (KB #2837)
ifa isi devices | grep -v HEALTHY: Find problem drives.
isi quota create --path=$PATH --directory --snaps=yes --include-overhead --accounting
cd /ifs; touch LINTEST; isi get -DD LINTEST | grep LIN; rm LINTEST: Find the current maximum LIN


RE-IMAGE / RE-FORMAT
In certain scenarios (single-node test clusters) you might want to re-image a node, the isi_reimage command can be used to accomplish this.  When used in conjunction with with the -b options, it is possible to re-image the node with any build you have media for the node.

isi_reimage -b OneFS_v5.5.4.21_Install.tar.gz

The isi_reformat_node command can be used reset the configuration on a node, format the dirves and reimage.  The command performs a variety of functions such as checking ware on SSD drives before proceeding with the reformat.

isi_reformat_node with the --factory options will format / reimage the node, turn off the nvram battery and power off the node.  Useful if you are pulling a node for long-term storage or shipping to another site.  

As with isi_reimage, you don't want to run either of these command on a node that is a member of a multi-node cluster.

A Node By Any Other Name


One of the great things about the Isilon architecture is that you can add and remove nodes from your cluster.  

Let’s say you have a cluster of three 12000X nodes and you want to replace then with three new x200 nodes, now you could leave the original nodes in the cluster as a lower / slower tier of storage and make use of the SmartPools technology to place you different data types on the most appropriate nodes, or you could simply replace you old nodes with new ones.

Suppose my cluster has three 12000X nodes zeus-1, zeus2 and zeus3.
I add three X200 nodes into the cluster, which are assigned the names zeus-4, zeus-5 and zeus-6.
I decide to retire / SmartFail the 12000X nodes and now have a cluster with just three nodes named zeus-4, zeus-5 and zeus-6. 

I could leave things exactly as they are, but I’d rather have my three nodes with names zeus-1, zeus-2 and zeus-3; no problem I can renamed then (without downtime) using the isi conf command.

From an ssh window, launch isi conf

Cluster-4# isi conf
  
cluster >>> lnnset 4 1
  
Node 4 changed to Node 1. Change will be applied on 'commit'

cluster>>> commit


Commit succeeded.
cluster-4# 
  
As you can see in the above, you may need to reconnect to your ssh session before the new node name is automatically changed. 

cluster-4#

cluster-4# hostname

cluster-1