NetApp Tutorial For Beginners
This tutorial gives you an overview and talks about the fundamentals of NetApp.
Data Ontap Software
The operating system of the NetApp FAS storage controller is named Data ONTAP. Data ONTAP is the foundation of the NetApp Unified Storage architecture, supporting a wide range of storage protocols and storage-management software features.
The NetApp FAS storage controllers are designed around the NetApp Unified Storage architecture and support numerous storage protocols, software features, and storage tiers in an appliance-like design. Refer to Figures 3 for identification of the supported protocols.
Figure 3 – Protocols supported by the Netapp FAS Controllers
NOTE: This figure identifies only the major storage, management, and authentication protocols. For brevity, some protocols, such as the NetApp API (Manage ONTAP Solution or ZAPI), and Network Information Service (NIS) are not shown.
When the storage controller is powered on for the first time, it runs the setup script. This script configures fundamental parameters such as hostname, network IP addresses, and CIFS authentication (if licensed). You can rerun the setup script manually at any time to reconfigure the parameters.
To manually reconfigure the FAS controller‘s Ethernet network interfaces, you use the NetApp System Manager interface or the ifconfig command; for example: ifconfig ns0 192.168.1.1 netmask 255.255.255.0
NOTE: Most changes made from the command-line interface are transient. To enable persistence after a reboot, you must enter the changes into the /etc/rc file.
You may also need to configure the built-in FC ports (located on the motherboard). The built-in FC ports can function in one of two modes:
The fcadmin command applies only to the built-in FC ports. Any FC HBA adapter that you purchase is predefined as either an initiator or target adapter.
The fcp show adapter command returns only the FC ports that are configured to function in target mode. The fcp show initiator command returns all host FC initiator ports that are visible via the controller‘s FC target ports.
Figure 4: Fibre Channel Initiator-Target Port Relationships
NOTE: Similar initiator-target relationships exist in iSCSI connections.
The administration of the NetApp FAS controller can be performed via various administrative interfaces.
1. NetApp System Manager: a GUI designed for single-system configuration
2. Command-line interface: accessed via Telnet, Remote Shell (RSH), or Secure Shell (SSH)
– The aggr status command displays the existing aggregates.
– The cifs shares –add … command defines a CIFS share.
– The options dns.enable on command enables Domain Name System (DNS) name resolution (which requires further configuration).
NOTE: In Data ONTAP 8.0 7-Mode, only secured protocols (such as SSH) are enabled by default. Open protocols (such as RSH and Telnet) are disabled by default. NetApp recommends use of the default.
3. Operations Manager: a licensed tool that is installed on a host and that provides sophisticated management tools for one or more storage systems. Operations Manager includes the following products:
– Performance Advisor
– Protection Manager
– Provisioning Manager
The license for Operations Manager enables Performance Advisor, Protection Manager and Provisioning Manager are separately licensed products.
NOTE: Management of the FAS storage infrastructure can be promoted to the host operating system (Windows or UNIX) by the SnapDrive® tool and to the application layer by the SnapManager® tool. Both tools are optional purchases and require licenses.
The various SAN and network-attached (NAS) protocols differ in many technical details: SAN provides block-level access, and NAS provides file-level access. Additionally, some protocols use FC connections, and others use Ethernet. However, both SAN and NAS provide remote systems (hosts) with access to centralized, shared storage.
Perhaps the simplest way to define the difference between SAN and NAS is by describing how the file system is managed and how access is shared. For a SAN device, the controller provides block-level access to the hosts. The hosts then create and manage their own local file system (for example, ext3), and the local file system is typically not shared with other hosts. On the other hand, NAS storage uses the file system on the controller—the WAFL® (Write Anywhere File Layout) system, and the controller provides shared file-level access to multiple remote systems (hosts).
Both FC and iSCSI SANs support multipathing. With multipathing, there are redundant physical paths (two or more) between a host and the controller. Refer to Figure 4 for an example.
Multipathing ensures storage availability, even if a SAN hardware failure occurs. NetApp supports various multipathing options for both FC and iSCSI.
FC and iSCSI multipathing:
– With the host platform‘s native multipath I/O (MDIO) driver
– With the NetApp device-specific module (DSM) for Windows
– With the Veritas Dynamic Mulitpath (VxDMP)software
iSCSI only multipathing, with iSCSI‘s inherent multiple connections per second (MCS) You choose the multipath solution that best meets the needs of your environment.
The NetApp on the Web (NOW)® online support and services site is available to all customers and business partners. You should become familiar with the NOW site, as it is a primary resource for questions regarding the configuration or performance of a FAS storage controller. Some of the resources available on the NOW site include:
- A knowledgebase (searchable) of NetApp product information
- Online manuals, for all NetApp products
- Software downloads, including updates and evaluation versions
- Return Merchandise Authorization (RMA), used to request replacement of failed hardware
- Bugs online, used to review all known software bugs
- Release Comparison, used to identify the version of Data ONTAP in which a particular bug was fixed
Before Data ONTAP 8.0, aggregates and, consequently, the volumes within the aggregates were based upon a 32-bit architecture. This architecture effectually limits the size of an aggregate to 16 TB.
As storage demands increase, the 16-TB limitation causes three major issues. First, as the size of each disk drive increases, the number of drives per aggregate must decrease. For example, in Data ONTAP 7.3, an aggregate can accommodate only nineteen 1-TB data drives. Second, performance is directly affected. As the number of drives decreases, the read and write performance that can be realized from an aggregate also decreases.
The third issue with the 16-TB limitation is that aggregate management becomes more complex and less efficient. For example, a FAS6070 system with 500 TB of storage requires a minimum of 32 aggregates. Such a requirement greatly increases the complexity of managing large storage arrays.
NetApp is pleased to announce that its Data ONTAP 8.07-Mode operating system supports 64-bit aggregates. This architecture overcomes the disadvantages of the 16-TB limitation.
NOTE: In Data ONTAP 8.0 Cluster-Mode, 64-bit aggregates are not available.
Data ONTAP 8.0 7-Mode supports aggregates of up to 100 TB—running on top-of-line hardware with greater aggregate sizes available in the future.
See Figure 5 for information about platforms and maximum aggregate sizes.
* Supported only through a Policy Variation Request (PVR)
Figure 5: 64-Bit Aggregates—Maximum Aggregate Size Per Platform
In Data ONTAP 8.0, NetApp added the -B switch to the aggr create command. This switch accepts 32 or 64 as a valid parameter.
The 32 parameter designates a 32-bit aggregate architecture, and the 64 parameter designates a 64-bit aggregate architecture. The 32 parameter is the default.
To create a 64-bit aggregate, you must explicitly use the 64 parameter. For example, if a storage administrator enters the following: aggr create sales aggr -B 64 24, a 64-bit aggregate is created. The new aggregate is named ?sales aggr,? and 24 spare disks, chosen by Data ONTAP, are added the new 64-bit aggregate.
NOTE: The command succeeds only if the specified number of spare disks exists. As in earlier Data ONTAP releases, the storage administrator can manually choose the disks to be added by specifying a -d switch and the disk names.
When a storage administrator upgrades a storage system from Data ONTAP 7.3 or earlier to Data ONTAP 8.0, all existing aggregates are designated as 32-bit aggregates. In Data ONTAP 8.0, you cannot directly convert a 32-bit aggregate to a 64-bit aggregate. Similarly, volumes within a 32-bit aggregate cannot be moved directly to a 64-bit aggregate. If you want to take advantage of the 64-bit aggregate feature, you must migrate data from a 32-bit aggregate to a 64-bit aggregate. To migrate data, you may use the ndmpcopy command.
You can use several commands on the FAS storage controller to collect system performance data. Performance data can consist of a broad summary of system performance or of details about specific performance parameters.
Some common commands for collecting performance data are:
– The default output includes CIFS, NFS, HTTP; CPU, nonvolatile RAM (NVRAM), network interface card (NIC), and disk performance values. The –b parameter adds block-level (that is, FC and iSCSI) performance to the output.
– Example: The sysstat –s 5 command runs every five seconds and prints a summary on termination. The default interval is 15 seconds.
– Example: The sysstat –u command displays extended utilization data, including consistency point (CP) time and type. The command can be used to identify when a controller is busy. Outputs such as cp_from_log_full (?F‘) and cp_from_cp (?B‘) indicate that the controller is busy.
– This command can output performance data on various object types, specific instances of object type, and other performance counters.
– The command displays many performance items, including per-disk performance data.
– This advanced mode command can provide low-level performance data.
– Example: The stats show cifs:cifs:cifs_latency command displays the average latency value for the CIFS protocol.
– Performance counters can also be accessed via the Windows PerfMon GUI (but only if CIFS is enabled).
Some commands for collecting statistics about the performance of the Ethernet network interface and other data are:
netstat: The netstat -i command identifies the network interfaces, the number of in and out network packets, errors, and collision values
ifstat: The ifstat –a command displays a low level view of interface performance data, including transmitted (Tx) and received (Rx) bytes per second
If you cannot resolve a performance problem by using the systat, stats, status, netstat, or ifstat commands, then you may need to download the perfstat command from the NOW online support and services site.
Characteristics of the perfstat command:
Captures all needed performance information
- Captures information from hosts and filers
- Captures all information simultaneously and thus enables cross correlation
- Operates on all host platforms and all filer platforms
- Records all captured data in one plain-text output file
Data Ontap Security
The security of the FAS controller, the security of the Data ONTAP operating system, and the security of the data stored on the controller are different topics. This section deals only with Data ONTAP security.
The Data ONTAP operating system supports role-based access control (RBAC), where defined roles, with specific capabilities, can be bound to groups. Individual users are then assigned to the groups. The assigned users can perform only the tasks that the roles assigned to their groups allow them to perform.
Figure 6: Role Based Access Control
For administrative purposes, one or more accounts are usually defined on the FAS storage controller. The useradmin command is used to create and modify local admin accounts.
You can use the useradmin commands, with their obvious variations, to add, list, delete, or modify users, groups, or roles. The capabilities are predefined and cannot be changed.
NOTE: There is always at least one administrative account, root, which cannot be deleted.
If you need to capture a network packet trace, so you can analyze a protocol level problem, then you could use the pktt command on the FAS controller.
Characteristics of the pktt command:
a) Is normally run for a short period, capturing network traffic on a particular interface, potentially filtered to target a particular client.
b) Produces output in standard tcpdump format. The output can be analyzed in third-party network monitoring tools, such as the following:
– ethereal (download from wireshark.org)
– netmon (download from 14icrosoft.com)
NOTE: You must first convert the tcpdump output to netmon format by using the capconv.exe command that you can download from the NOW online support and services site.
The NetApp FAS storage controllers support both the FC and iSCSI SAN standards.
One feature (among many) that distinguishes the NetApp SAN architecture from competitive solutions is that the LUNs are defined below the SAN protocol layer. Therefore, a NetApp LUN is not an FC LUN or an iSCSI LUN, but it can be exposed to a host via either or both protocols, even simultaneously. This flexibility allows for easy migration between the SAN access protocols.
Refer to Figure 4 and the accompanying text for a description of the initiator and target configuration of the controller‘s built-in FC ports.
In a high-availability configuration, a LUN that is being accessed via FC is visible from every FC target port on both FAS storage controllers (even though the LUN is actually ?owned? by only one controller). This FC access mode, which is called ?single image,? has been the default failover mode for FC controllers since the release of Data ONTAP 7.2 software. You can change from one to another failover mode (with caution) by using the fcp cfmode command. After you change modes, you must reboot the controller.
The failover modes for FC controllers are:
1. single_image (the default since Data ONTAP 7.2)
– LUNs are visible on all FC target ports labeled by their worldwide port names (or WWPNs) on both controllers.
– A worldwide node name (WWNN) is shared by both controllers.
If you require a description of the earlier FC cluster modes, refer to the product documentation.
Although all of the failover modes for FC controllers are supported for existing systems, only the single_image mode is supported for new storage systems. The following table details some of the features and limitations of the various failover modes:
Figure 7: Failover Modes For Fc Controllers
NOTE: Failover mode for FC controllers is an FC-specific concept. The iSCSI protocol handles controller failover in a completely different manner.
Conceptually, FC and iSCSI are very similar, although they vary greatly in detail and implementation (for example, FC is a wire-level protocol, whereas iSCSI travels on top of a TCP connection). However, the end result is the same; they both provide block-level services to a host, so the host can access a LUN.
After the SAN protocols are licensed, they must be started. No configuration or host access can occur until the protocols are started. Use the following commands to start each of the SAN protocols:
– fcp start, to start the protocol
– fcp status, to return the status of the protocol
– iscsi start, to start the protocol
– iscsi status, to return the status of the protocol
Another difference between the FC and iSCSI protocols concerns ports and adapters. The FC protocol can use only a dedicated, target-mode FC port on a specialized FC HBA adapter. An iSCSI protocol can use either a specialized iSCSI HBA adapter or any standard Ethernet port. If a standard Ethernet port is used, then the controller uses the iSCSI software target (ISWT).
If you are using ISWT support in a high-availability controller configuration, then you may notice two ISWT devices:
- ISWTa, for the local controller
- ISWTb, for the partner controller, used for controller failover
The SAN protocols provide block-level access to the LUNs on the storage controller. You can create the LUNs by using various tools, as follows:
- NetApp System Manager, a GUI downloadable from the NOW online support and services site
- Command-line interface, using either of the following two methods
– To create each item manually, run the lun create, igroup create and lun map commands.
– To create everything from a wizard, run the lun setup script and follow the prompts. When you use the wizard, you do not need to manually create the igroups or map the LUNs.
- SnapDrive, which allows you to manage storage from the host
NOTE: When you create a LUN, you need to know certain facts, such as the location (path) of the LUN, the OS of the respective host, the capacity required, and the LUN ID.
The LUN ID is the means by which a host identifies a LUN and distinguishes between multiple LUNs. Therefore, LUN that is presented to a host must be unique to the host. However, of course, each host has its own LUNs and LUN IDs, and the LUNs and LUN IDs of each host are independent of the LUNs and LUN IDs of all other hosts.
- Multiple LUNs to the same host must have unique LUN IDs.
- Each host can have its own IDs. IDs can conflict only within a host, not between hosts.
When a LUN is created via the lun setup wizard, multiple commands are rolled up into the script. However, if you create a LUN manually, then you must run two additional commands—to make the LUN accessible from a host. To filter which LUNs are visible to which hosts (sometimes called ?LUN masking?), you must use the igroup create and lun map commands.
- igroup create defines a relationship between a host (WWPN) and the OS type and identifies whether the current connection is an FC or iSCSI connection.
NOTE: Assuming that the SAN zoning is correct, the host should now be able to rescan for and connect to the new LUN.
You need to be aware of the minimum and maximum LUN sizes that are supported by each of the host operating systems. Refer to the Figure 8:
Figure 8: Supported Lun Sizes Per Host Platform
NOTE: Not all supported operating system platforms are shown here. For more detailed information, refer to the NOW online support and services site and host platform documentation.
How a LUN consumes capacity on its parent volume and how LUN capacity is affected by the creation of Snapshot® copies is a complex topic. If a volume that contains a LUN is incorrectly configured, the host may think that the volume has space available for writing data while, in reality, the volume is filled with Snapshot data. When the host attempts to write data, the LUN goes offline, and manual rectification is required. Obviously, the LUN capacity problem creates a very undesirable scenario.
There are several ways to prevent the LUN capacity problem from occurring:
1. Space reservation—former best practice
– Until recently, space reservation was the recommended method to guarantee space for writes to a LUN (regardless of the number of Snapshot copies).
– Some of the parameters to be configured for this method are volume fractional reserve to 100% and LUN space reservation to Yes.
lun create (LUN space reservation is on by default.)
lun set reservation (to change the space reservation of a LUN)
– Configuration of the parameters causes an amount of space equal to the size of the LUN (100%) to be excluded from Snapshot copies, thus guaranteeing that writable capacity is always available to the host.
– Such a configuration is sometimes referred to the ?two times plus Delta (2X+Δ)? overhead.
Figure 9 – Configuring Volume And Lun Provisioning
NOTE: Some documentation may still refer to these former best practices.
2. Volume AutoGrow and Snapshot AutoDelete—current best practice
– In the last few years, two automatic utilization thresholds were introduced, replacing the former best practice (use of the 100% fractional reserve) with the current best practice.
– Some of the parameters to be configured for this method are Volume AutoGrow and Snapshot AutoDelete. When the parameters are configured, the Volume Fractional Reserve can be safely set to 0%.
– This parameter configuration sets a utilization threshold at which the containing volume automatically grows and/or at which certain existing Snapshot copies are deleted. Use of the threshold ensures that space is always available for the host to write to the LUN,
– Use of the threshold changes the capacity required to ?one times plus Delta (1X+Δ).? Use of thin provisioning can produce an even better result.
Figure 10 – Configuring Automatic Capacity Management
NOTE: For more information about volume and LUN space management, refer to the product documentation.
You can use lun commands not only to create LUNs but also to perform various other LUN-related actions:
- lun offline, makes a LUN unavailable for host access
- lun move, relocates a LUN from one to another path within the same volume
- lun show, displays various types of information about LUNs
– To display the LUN to igroup mappings: lun show -m
– To display the LUNs operating system platform types: lun show -v
- lun clone, instantly creates an read/writable LUN as a clone of an existing LUN
The new LUN is thin provisioned (no space is consumed until new data is written), and the two LUNs share blocks with a snapshot of the original LUN (known as the ?backing? snapshot)
lun create –b /vol/vol1/.snapshot/lun1 /vol/vol1/lun2
- lun clone split, splits a LUN clone from its backing snapshot
Because a LUN clone locks the backing snapshot (that is, prevents the backing snapshot from being deleted), it is recommended that, for long-term use, the relationship be split
lun clone split start /vol/vol1/lun2
NOTE: For more detailed descriptions of the lun command options, refer to the product documentation. Remember that the Snapshot copy of a LUN is created at the volume level. So, all data in the volume is captured in the Snapshot copy including multiple LUNs and NAS data if present in the volume. Therefore, it is best practice to simplify the Snapshot copy process by one-to-one (1:1) relationship between volume and LUN (or in other words, having the volume containing only a single LUN).
A LUN can contain a file system that is managed by the host and/or contain a database that is managed by an application on the host. In this case, when Snapshot copies are created, only the host and the application can ensure the consistency of the file system and database.
Therefore, it is often necessary to coordinate with the relevant host during the Snapshot backup of a LUN. Usually, the coordination process occurs with no disruption to service.
1. On the host
– Quiesce the application or database
– Flush the host‘s file system buffers to disk (LUN)
2. On the controller, create the Snapshot copy of the LUN.
The copy now contains a consistent image of the host‘s file system and application database in an idle or offline state.
3. On the host, unquiesce the application or database.
NOTE: The NetApp host attachment kit includes a utility to flush the host‘s file system buffers to disk.
When you create a Snapshot backup of a LUN, you must consider the free capacity in the containing volume. For example, assume that you configure a 400-GB volume and then create a 300-GB LUN inside the volume. If you fill the LUN with data, then a subsequent attempt to create a Snapshot copy fails. The failure occurs because, if the Snapshot copy were created, there would be insufficient free capacity in the volume to allow the host to continue to write to the LUN.
For detailed information about the various capacity management methods that are relevant to Snapshot backup of volumes and LUNs, refer to the Snapshot section.
To restore from a Snapshot backup of a LUN, you can use either of two methods:
– This method restores the entire volume and the LUN.
– The LUN must be offline or unmapped.
- LUN clone: You create a clone of the LUN and then mount the clone. You can then copy individual files back to the original LUN.
NOTE: If the Snapshot backup contains NAS data (rather than a LUN), then you can browse to the .snapshot directory and copy the individual files back to the active file system or, for a very large file, you can use single file SnapRestore.
Perhaps the best way to manage (and create Snapshot copies of) SAN storage is to use the NetApp host and application integration tools. For each supported host operating system platform, there is a SnapDrive package, and, for many popular business applications, there is a SnapManager package. These tools provide an easy-to-use, highly functional interface for managing the storage controller.
– Create a LUN
– Connect to a LUN
– Trigger a new consistent Snapshot copy of a file system
– Restore a Snapshot backup
– Clone a LUN
– Manage iSCSI connections
– Trigger a new consistent Snapshot copy of an application
– Manage the retention of Snapshot backups
– Restore a Snapshot backup of an application
– Clone an application or database (only for specific applications)
– Migrate application data onto LUNs
NOTE: You can perform Snapshot application integration without SnapManager support. To do so, you must create the required automation scripts. Many examples of the scripts can be downloaded from the NOW online support and services site.
The performance of the SAN is dependent on the available physical resources and the host, switch, and controller configurations.
– Model (versus expected performance level)
– Number of spindles supporting the LUNs
– Network speed
– Number of ports or paths
– Balanced workload between high-availability controllers
– Background tasks (replication, RAID scrub, RAID rebuild, deduplication, and so on)
– Network speed
– Port oversubscription
– Switch workload
– Workload type (random versus sequential and read versus write)
– Network speed
– Multipathing configuration (high-availability or failover only)
– Number of paths
– HBA tuning
– Correct partition alignment
Because all iSCSI traffic is carried over the TCP protocol, you can list connections (an iSCSI initiator-to-target relationship) and sessions (TCP) separately. The connections-to-sessions relationship can be configured in either of two modes.
1. Single connection per session (1:1)
– Creates one iSCSI connection per TCP session
– Is supported by the Microsoft Multipath I/O (MPIO) and NetApp DSM for Windows
2. Multiple connections per session (MCS)
– Creates multiple iSCSI connections per TCP session
– Provides a native iSCSI multipathing capability
Figure 11- Iscsi Sessions And Connections
The commands to list the iSCSI connections and sessions are:
iscsi session show
iscsi connection show –v
NOTE: The advantages and disadvantages of the multipathing methods are beyond the scope of this document.
The security of the SAN is enforced at several levels—some on the controller, some in the fabric, and some in conjunction with the hosts.
– LUN masking (igroups)
– Port masking (Portsets)
– SAN zoning
– LUN masking (igroups)
– Port masking (iscsi accesslist)
– Ethernet VLANs
– Passwords (iscsi security)
– Encryption (ipsec)
In an FC SAN, the ability of the various devices to discover each other and communicate across the fabric is controlled by the zone definitions. Only devices that reside in the same zone can communicate. The two main types of SAN zones are:
1. Soft zones
– Is usually defined using the WWPN of the host‘s and controller‘s FC ports; allows communication between the device‘s FC ports
– Blocks inter-zone traffic by filtering device discovery at the fabric name service but does not explicitly block traffic
2. Hard zones
– Is usually defined using the physical port numbers of the FC switch; allows communication between the physical switch ports
– Blocks inter-zone traffic by filtering device discovery at the fabric name service and by preventing traffic between switch ports
When designing FC SAN zones, apply the following recommendations:
- Decide on a naming convention and use only the naming convention that you decided on
- Keep disk and tape devices in separate zones
- Define a zone for every required initiator-target pairing
A concern with an iSCSI SAN is that the block-level traffic and other, less-sensitive data might be travelling over the same physical network. In this situation, the iSCSI traffic is exposed to network snooping and other attacks.
The NetApp FAS controllers support port masking, bidirectional password access, and network encryption of the iSCSI traffic.
- iSCSI port masking allows you to deny specified network adapters the ability to expose an iSCSI initiator group to the hosts:
iscsi interface accesslist remove ig0 e0a
- iSCSI password requires an iSCSI host that is trying to access a LUN to respond to a password challenge. It can also require a controller to answer a password challenge from the host.
IPSec encryption enables encryption of the Ethernet network traffic. Because it is not an iSCSI specific setting, it encrypts all Ethernet traffic (but the client, of course, needs to know how to decrypt the traffic).
options ip.ipsec.enable on and ipsec policy add
NOTE: Network encryption, although effective, can negatively affect the performance of what should be a high performance iSCSI solution. Therefore, many sites choose not to encrypt iSCSI traffic and instead deploy a separate network infrastructure or a virtual local area network (VLAN) to isolate the iSCSI traffic.
Troubleshooting a SAN problem requires knowledge of the host, the SAN switches and network infrastructure, and the storage controller. Much of the information associated with this knowledge is outside the scope of this document. However, here are some actions that you can perform to troubleshoot a simple LUN access problem:
1. On the controller
– Can the storage controller see the host‘s FC adapters?
fcp show initiator
– Is the LUN being presented to the host?
lun map –v
– If there a problem with iSCSI password access?
iscsi security show
2. On the SAN switches, are the host‘s FC initiator ports and the controller‘s FC target ports in the same zone?
– Run the zoneshow command (Brocade).
– Run the show zone command (Cisco MDS).
3. On the host (Solaris 9)
– Is the host configured to detect the LUN?
– Is the LUN ID in the /kernel/drv/sd.conf file?
– Has the host rescanned to detect the LUN?
– Run the devfsadm command.
NOTE: For additional troubleshooting recommendations, refer to the product documentation.
The Common Internet File System (CIFS) is the default NAS protocol included with Microsoft Windows. The NetApp storage controller can participate in an Active Directory domain and present files via the CIFS protocol.
The CIFS protocol is a licensed feature that must be enabled before it can be configured and used to present files for client access.
NOTE: If the controller was ordered with the CIFS license, then CIFS license will already be installed, and the CIFS setup wizard will start automatically when the system is booted.
The easiest way to configure the CIFS protocol is to run the CIFS setup wizard. The wizard prompts you for all aspects of the CIFS configuration, such as the NetBIOS host name, the authentication mode (for example, Active Directory), and the local administrator password. To start the wizard, you run the cifs setup command:
The choices for CIFS user authentication are:
- Active Directory
- NT Domain
- Windows Workgroup
- Non-Windows workgroup
NOTE: If you wish to configure Active Directory mode for user authentication, then the system time on the storage controller must be within five minutes of the system time on the Active Directory server. This requirement is inherited from the Kerberos protocol, which Active Directory uses for improved security. It is recommended that both systems be configured to use the same network time server, if one is available.
You can not only use an external authentication system such as Active Directory but you can also define local users and groups. It is recommended that a local administrator be defined—to be used if the Active Directory server is unavailable.
You can add domain users (such as a Domain Admin) to local groups. This ability can be useful when you are managing the storage controller as part of a larger enterprise. For example, to add the domain user ?Steven? to the local Administrators group, run the following command:
useradmin domainuser add steven -g Administrators
For information about how to manage local users and groups, refer to Figure 6 and to the text related to Figure 6.
To create and manage CIFS shares, you must use the cifs command and the shares parameter. Here are some examples of the use of the cifs command and the shares parameter:
List the existing shares
Create a share
Using the CIFS protocol, you can create shares to expose the following object types for user access:
It is sometimes desirable to set a quota on the amount of disk space that an individual user or group can consume via the CIFS (or NFS) protocol. There are several types of quotas and various options to enforce each type of quota.
1. Quota targets
– User, controls how much data the specified user can store
– Group, controls how much data the specified group of users can store
– Qtree, controls how much data can be stored in the specified qtree (similar to a directory)
NOTE: The Administrator and root users are exempt from the all quota types except the qtree quota type.
– Default, applies only if the specified quota is not defined for the current user or group
2. Quota objects
– Capacity, limits the amount of disk space that can be used (sets a maximum)
– Files, limits on the number of files that can be stored (sets a maximum)
3. Quota thresholds
– Soft: When the specified threshold is reached, the controller sends only a Simple Network Time Protocol (SNMP) trap. The user‘s write succeeds.
– Hard: When the specified threshold is reached, the controller prevents the user‘s attempt to write more data. The client displays a ?filesystem full? error.
Quotas are configured and managed via either the quota command or NetApp System Manager (1.1 and later) interface. Both methods store the quota configuration in the /etc/quotas file on the controller‘s root volume.
List the active quotas
When you modify the limits of a quota, a file system scan is not needed. However, the file system scan is not needed only if you modify the limits on an existing quota.
Report on quota usage
A report that details the current file and space consumption for each user and group that is assigned a quota and for each qtree is printed.
Here is an example of an /etc/quotas configuration file:
Figure 6 : An Example Quota Configuration File
NOTE: The second line, which begins with an asterisk (*), is a default quota.
The FAS storage controller has numerous performance counters, statistics, and other metrics. Also, various CIFS-specific performance-analysis commands are available; for example:
- cifs stat, displays CIFS performance statistics
- cifs top, displays the most active CIFS clients, as determined by various criteria
NOTE: By default, CIFS statistics are cumulative for all clients. To collect statistics for individual clients, you must enable the cifs.per_client_stats.enable option.
For detailed information about the controller-performance commands that are not specific to CIFS (for example, sysstat, stats, and statit), refer to the Data ONTAP section.
Changing some CIFS performance parameters is disruptive. Such changes can be performed only when no clients are connected. Here is an example process:
1. cifs terminate
This command halts the CIFS service and disconnects all clients.
Because the CIFS protocol always uses Unicode, if a volume is not configured for Unicode support, then the controller must continuously convert between Unicode mode and non-Unicode mode. Therefore, configuring a volume for Unicode support may increase performance.
To manage user access to CIFS shares, you use the cifs command and the access parameter. Here is an example of how user access rights are evaluated:
You issue the command that identifies shares and access rights.
The list of share-level access rights are:
– No Access
– Full Control
The system evaluates the user‘s security ID (SID), which is usually provided at login time by the Active Directory server, against the user‘s share permissions.
Access to the share is either granted or denied.
The system evaluates the user‘s SID against the file or directory permissions. On a file-by-file basis, access is either granted or denied.
If mandatory file locking is requested by the client, the CIFS protocol enforces it.
CIFS is a complex protocol that uses several TCP ports and interacts with various external systems for user authentication, Kerberos security (Active Directory), and host name resolution. Therefore, the CIFS protocol has numerous potential points of failure; but, remarkably, it is usually very reliable.
A full description of CIFS troubleshooting is outside the scope of this document. For detailed troubleshooting information, refer to the ANCDA Boot Camp training materials.
If you suspect that connectivity problems are the source of your CIFS problems, you can try the following commands:
- ping, provides a standard TCP connection test
- testdc, tests communication with the Active Directory server
- ifstat, displays a low-level view of network-interface performance data
- netdiag, analyzes the network protocol statistics to ensure correct operation and displays, as required, suggested remedial actions
Another potential issue is file access permissions. In this case, the access protocol is CIFS, but the containing volume or qtree may have a UNIX security style. This configuration, which is called ?multiprotocol access,? is discussed in the next section.
The NetApp storage controller can present files via the CIFS protocol and the NFS protocol simultaneously. Because the controller includes sophisticated mappings between Windows and UNIX user names and file system permissions, the operation is seamless.
The only requirement for multiprotocol access is that CIFS access and NFS access be configured to the same file system. Of course, some unavoidable complexities arise, because Windows systems and UNIX systems use different security semantics.
Some security settings and user-name mappings do need to be configured. The mappings that do not need to be configured are discussed in the Security section.
The CIFS and NFS protocols are managed separately, even when they refer to the same file system. Therefore, the protocols should not be the source of any multiprotocol concern. For detailed information about the administration of the CIFS and NFS protocols, refer to the CIFS and NFS administration sections.
Multiprotocol access should not be the source of any performance concern. For detailed performance information, refer to the CIFS and NFS performance sections.
The default security style for all new volumes is controlled by a WAFL option:
Each file and directory can have either UNIX or NTFS permissions (but not both at the same time).
NOTE: Because the use of mixed mode can complicate the troubleshooting of file-access problems, it is recommended that mixed mode be used only when it is needed to accommodate a particular requirement.
To manually set or view the security style for a volume or qtree, you use the following commands:
- qtree status, displays the list of volumes and qtrees and their security styles
Although the security-style setting controls the underlying file system‘s security type, access via CIFS or NFS to the file data is controlled by the normal user-authorization process. The multiprotocol environment introduces additional complexity because the relationships between the security semantics of the users, the shares and exports, and the file system must be mapped. Consider the following example:
- Evaluate the user‘s SID or user ID against the share or export permissions. Access to the share or export is either granted or denied.
- Evaluate the user‘s SID or user ID against the file or directory permissions.
If the ID and the permissions are of different types (for example, Windows and UNIX), then it may be necessary to map the user name to the correct type.
On a file-by-file basis, access is either granted or denied.
The following diagram identifies where user-name mapping may need to occur.
Figure 7: Multiprotocol Access And User-Name Mapping
The user-name mapping is defined in the /etc/usermap.cfg file. The /etc/usermap.cfg file is a simple text file in the root volume of the storage controller. The process of mapping user names between Windows and UNIX contexts is reasonably straightforward:
- Automatic, if the Windows and UNIX user names match
- Specified (win_user = unix_user), if the user names are defined in /etc/username.cfg file
- Default, if the user names differ and there is no specific mapping. In this case, attempt to use the defined default UNIX or Windows user name (if any).
This command displays the Windows to UNIX user-name mapping, for example:
DomainAdministrator => root
If you are connecting as the Administrator or root user to a volume with a foreign security style, the easiest way to overcome an access problem may be to set the following options:
options wafl.nt_admin_priv_map_to_root on
options cifs.nfs_root_ignore_acl on
The wafl and cifs options grant superuser privileges on the foreign volume to Administrator and root users, respectively.
In some cases, due to variances in user populations and variances in CIFS and NFS file locking abilities, you may need to debug a file-access problem (for example, an NFS client can‘t open a file because it is locked by a CIFS client). You can use the cifs sessions command to list the clients that have active CIFS connections.
Another possible issue with mixed NFS and CIFS access is that NFS clients support symbolic links in the file system and CIFS clients generally do not support symbolic links. However, if you set the cifs.symlinks.enable option to on (the default value), then a CIFS client can successfully resolve any symbolic-link problem that was created by an NFS client.
NOTE: For detailed information about the interaction of CIFS clients with the various types of symbolic links, refer to the product documentation.
To resolve (better yet, to avoid) the simplest of all access problems, create both a CIFS share and an NFS export.
The Network File System (NFS) is the default NAS protocol that is included with all UNIX platforms. The NetApp storage controller can present files via the NFS protocol and can also participate in a Kerberos domain.
The NFS protocol is a licensed feature that must be enabled before it can be configured and used to present files for NFS client access.
The NFS server configuration is described in the /etc/exports file. The file lists all NFS exports, specifies who can access the exports, and specifies privilege levels (for example, read-write or read-only).
FIGURE 8: AN EXAMPLE /ETC/EXPORTS FILE
NOTE: Any volume (or qtree and so on) that is to be exported is listed in the configuration file. The /etc/exports file contains three types of information:
– Exports are resources that are available to NFS clients.
– Example: /vol/flexvol1 identifies a resource that is to be exported.
– NSF clients can be identified by their host names, DNS subdomains, IP addresses, IP subnets, and so on:
– Example: 10.254.134.38
– Exports specify access permissions for NFS clients.
– Example: rw and nosuid specify access permissions.
To perform NFS server administration tasks, you use the exportfs command and the /etc/exports file. Here are some examples of their use:
List the current exports (in memory)
List the persistent exports (available after a reboot)
Create an export (in memory)
exportfs –i –o rw=host1 /vol/vol1
Create a persistent export (available after a reboot)
wrfile –a /etc/exports
With the NFS protocol, you can create exports to expose the following object types for user access:
NOTE: Unlike most UNIX variants, the FAS controller can successfully export nested directories (and, thus, can export ancestors and descendants). For example, both /vol/vol1 and /vol/vol1/qtree1 can be exported, and the NFS client must satisfy only the access controls that are associated with the mount point that was initially accessed.
For a description of file system quotas, refer to the CIFS configuration section.
The FAS storage controller has numerous performance counters, statistics, and other metrics. Various NFS-specific performance-analysis commands are available; for example:
nfsstat, displays NFS performance statistics
nfs_hist, an advanced mode command that displays NFS delay time distributions (that is, the number of I/O operations per millisecond grouping)
netapp-top.pl, a Perl script that lists the most active NFS clients. The script is run on a client system. The script can be downloaded from the NOW online support and services site at the following location: http://now.netapp.com/NOW/download/tools/ntaptop/.
NOTE: By default, the NFS statistics that are reported are cumulative for all clients. To collect statistics for individual clients, you must enable the nfs.per_client_stats.enable option.
For detailed information about the controller-performance commands that are not specific to NFS (for example, sysstat, stats, and statit), refer to the Data ONTAP section.
Performance testing is a complex topic that is beyond the scope of this document. Nevertheless, you can perform some very basic performance analysis by using the following procedure (from the NFS client):
To write traffic, run the time mkfile command
To read traffic, run the time dd command
To read/write traffic, run the time cp command
NOTE: If you are concerned about repeatable performance testing, then you should investigate utilities such as iometer, iozone, and bonnie++ and the NetApp sio tool.
Traditionally, security has been seen as a weak spot for the NFS protocol, but recent versions of NFS support very strong security. The traditional security model is called ?AUTH_SYS,? and the newer model is called ?Kerberos.? Here is a summary of the differences between the two security models:
– User authentication is performed on the remote NFS client (which is typically a UNIX server). This scenario implies that the authentication process on the NFS client is trusted and that the NFS client is not an impostor.
– No additional authentication, data-integrity evaluation, or data encryption is performed.
– User authentication is performed on the remote NFS client, and Kerberos authenticates that the NFS client is genuine.
– There are three levels of Kerberos security.
1) krb5: Authentication occurs with each NFS request and response.
2) krb5i: Authentication occurs with each NFS request and response, and integrity checking is performed, to verify that requests and responses have not been tampered with.
3) krb5p: Authentication occurs with each NFS request, integrity checking is performed, and data encryption is performed on each request and response.
NOTE: If you wish to configure Kerberos mode for user authentication, then the system time on the storage controller must be within five minutes of the system time on the Kerberos server. This requirement is inherited from the Kerberos protocol. It is recommended that both systems be configured to use the same network time server, if one is available.
Before a user can access an export, the NFS client (remote UNIX server) must be able to mount the export. Then, the user‘s access to the shared files in the export is evaluated against the user‘s UNIX user ID and group ID (UID and GID), which is usually provided at login time by the NFS client. This is a two step process; for example:
Evaluate the server‘s host name against the export permissions Access is either granted or denied (to mount the export, r/w or r/o).
Evaluate the user‘s UID/GID against the file or directory permissions Access is either granted or denied (on a file-by-file basis).
The NFSv2 and NFSv3 protocols use advisory file locking, and the NFSv4 protocol enforces mandatory file locking, if it is requested by the client.
NFS is a complex protocol that uses several TCP ports and interacts with various external systems for user authentication, Kerberos security, and host name resolution. As such, NFS has numerous potential points of failure, but, remarkably, it is usually very reliable.
A full description of NFS troubleshooting is outside the scope of this document. For detailed troubleshooting information, refer to the ANCDA Boot Camp training materials.
If you suspect that RPC problems are the source of your NFS problems, you can try the following actions:
Verify that RCP is enabled
Verify that NFS daemons are running
Verify that mount points exist
If you are having problems mounting an NFS export, you can try the following commands:
This command, which is run from the NFS client, lists the exports that are available on the NFS server.
This command, which is run from the controller, displays low-level statistics that are useful in debugging a mount problem.
If you are experiencing ?stale NFS handle? errors, you can try the following actions:
Check the /etc/fstab file on the host for errors.
Check connectivity between the two systems by using the ping
List the exports that are available on the NFS server by running the showmount –e command on the NFS client
Check the controller‘s /etc/exports file.
Check the controller‘s current exports in memory by running the exportfs
EXAM NS0-154: DATA PROTECTION CONCEPTS
As a NetApp Certified Data Management Administrator, you can implement an active-active controller configuration and use SyncMirror software to ensure continuous data availability and rapid recovery of data and use the SnapMirror®, SnapRestore®, and SnapVault® products to manage and protect data.
Set up and maintain Snapshot™ copies
Configure and administer SnapRestore technology
Configure and administer asynchronous SnapMirror product
Configure and administer synchronous SnapMirror product
Configure and administer Open Systems SnapVault application
Configure and administer Operations Manager application
Configure and administer SnapLock® technology (not applicable in Data ONTAP 8.0 7-Mode or the corresponding NS0-154 exam)
Analyze and resolve data protection problems
Implement high-availability (active-active) controller configuration (including SyncMirror)
Instructor-led: Data ONTAP 8.0 7-Mode Administration
Instructor-led: NetApp Protection Software Administration
Instructor-led: Accelerated NCDA Boot Camp Data ONTAP 8.0 7-Mode
Web-based: High Availability on Data ONTAP 8.0 7-Mode
Web-based: Planning and Implementing MetroCluster on Data ONTAP 8.0 7-Mode
Web-based: Implementing SyncMirror on Data ONTAP8.0 7-Mode
This section describes various NetApp FAS learning points that are relevant to the NS0-163 and NS0-154 exams. These learning points focus on data protection concepts. However, the section is not limited to the exam topics. Rather, it also summarizes information about a range of NetApp technologies.
Figure 14 highlights the main subjects covered in the exam (white text) and the range of topics covered within each subject (black text).
FIGURE 9 : TOPICS COVERED IN THE NS0-163 AND NS0-154 EXAMS
A Snapshot copy is a read-only image of a volume or an aggregate. The copy captures the state of the file system at a point in time. Many Snapshot copies may be kept online or vaulted to another system, to be used for rapid data recovery, as required.
The Snapshot capability of the FAS storage controller is a native capability that is provided by the WAFL file system layer. Both SAN and NAS data can be captured in a Snapshot copy.
NetApp Snapshot technology is particularly efficient, providing for instant creation of Snapshot copies with near-zero capacity overhead.
This efficiency is possible because, like most UNIX file systems, the WAFL file system uses inodes to reference the data blocks on the disk. And, a Snapshot copy is a root inode that references the data blocks on the disk. The data blocks that are referenced by a Snapshot copy are locked against overwriting, so any update to the active file system (AFS) is written to other locations on the disk.
Refer to Figure 16 for an example of how the Snapshot process occurs.
FIGURE 10: THE PROCESS OF A SNAPSHOT COPY LOCKING SOME BLOCKS IN THE AFS
NOTE: Each volume can retain up to 255 Snapshot copies.
When you create a volume, a default Snapshot schedule is created. Initially, Snapshot copies are created according to the schedule‘s default settings. However, you can modify or disable the default settings to satisfy your local backup requirements.
List the default Snapshot schedule.
– The command: snap sched
– The output:
: 0 2 6@8,12,16,20
The default schedule creates four hourly Snapshot copies (at 8:00, 12:00, 16:00, and 20:00) and retains 6 total, a daily Snapshot copy (at 24:00 Monday through Saturday and Sunday if a weekly Snapshot copy is not taken), retaining 2 at a time and zero weekly Snapshot copies (if created, these would occur at 24:00 on Sunday).
Modify the Snapshot schedule by running a command similar to the following:
snap sched weekly nightly hourly@
snap sched 2 7 6@6,9,12,15,18,21
Disable the Snapshot schedule by running one of the following commands.
snap sched 0 0 0
vol options nosnap on
NOTE: On volumes that contain LUNs, you normally disable the controller initiated Snapshot copies because the consistency of the file system in the LUN can be guaranteed only by the host that accesses the LUN. You should then use a tool such as SnapDrive to initiate the Snapshot copies from the host.
A percentage of every new volume and every new aggregate is reserved for storing Snapshot data. is the reserved space is known as the ?Snapshot reserve.? The default reserve is 5% for aggregates and 20% for volumes. You can modify the default values by running the snap reserve command.
Refer to Figure 17 to identify where the Snapshot reserve values apply.
FIGURE 11: AGGREGATE AND VOLUME SNAPSHOT RESERVES
NOTE: The value for the volume Snapshot reserve is the minimum amount of space that is reserved for Snapshot data. The Snapshot copies can consume more space than the initial reserve value specifies.
Almost all Snapshot management is performed by running the snap command; for example:
snap list, shows the currently retained Snapshot copies
snap create , create a volume Snapshot copy (If you want an aggregate level Snapshot copy, then specify –A .
snap delete , deletes the specified Snapshot copy and makes its disk space available
NOTE: Some special types of Snapshot copies (for example, Snapshot copies created with SnapMirror and SnapVault software) are created and managed by the storage controller and should not be interfered with.
Snapshot copies that are created with SnapDrive and SnapManager software should not be managed from the storage controller. These Snapshot copies are created, retained, and deleted under the control of the host and application integration agents. Because the copies contain consistent backup images that are being retained by schedules and policies on the agents, they should not be deleted manually.
Typically, because Snapshot technology is very efficient, the creation, retention, and deletion of Snapshot copies make no significant impact on performance.
For information about performance, refer to the SnapRestore performance section.
By definition, a Snapshot copy is a read-only view of the state of the file system at the time that the copy was created. Therefore, the contents of the copy cannot be modified by end users.
User access to the data in a Snapshot copy is controlled by the file system security settings (for example, NTFS ACLs) that were in place when the Snapshot copy was created.
NOTE: If the security style of the volume changes after the Snapshot copy is created, then the users may not be able to access the file system view in the Snapshot directory (unless their user-name mapping is configured to allow them to access the foreign security style). This problem arises because the previous security settings are preserved in the Snapshot view.
The storage administrator can configure the visibility of the Snapshot directory (for NAS clients). The following commands either enable or disable client access to the Snapshot directory:
The default volume settings do allow the Snapshot directory to be seen by the NAS protocols. Use the following command to disable the Snapshot directory per volume:
vol options nosnapdir on
For CIFS access
The default CIFS settings do not allow the Snapshot directory to be seen by CIFS clients. Use the following command to enable the Snapshot directory.
options cifs.show_snapshot on
For NFS access
The default NFS settings do allow the Snapshot directory to be seen by NFS clients. Use the following command to disable the Snapshot directory per volume.
options nfs.hide_snapshot on
Usually, there are no problems with creating Snapshot copies per se, but complications can arise. These complications are usually a result of incorrect scheduling or lack of disk space.
Inconsistent file system
If you create a Snapshot copy of a volume that contains a LUN, the file system in the LUN may or may not be in a consistent state. If the file system is corrupt, you may not be able to use the LUN Snapshot copy to recover data. The consistency of the file system in the LUN can be guaranteed only by the host that accesses the LUN. You should use a tool such as SnapDrive to initiate the LUN Snapshot copies from the host (so the tool can flush the local file system buffers to disk).
Hosts, LUNs, and space within controller volumes
The host that accesses a LUN assumes that it has exclusive control over the contents of the LUN and the available free space. However, as Snapshot copies are created, more and more of the space in the containing volume is consumed. If the Snapshot copies are not managed correctly, they eventually consume all of the space in the volume. If all of the space in a volume is consumed and the host attempts to write to a LUN within the volume, an ?out of space? error occurs (because the host assumed that space was available). The controller then takes the LUN offline in an attempt to prevent data corruption. Refer to Figure 8 and to the text that accompanies Figure 8 for descriptions of Fractional Reserve and of the Volume AutoGrow and Snapshot Autodelete options and for an explanation of how Fractional Reserve, Volume AutoGrow, and Snapshot Autodelete ensure adequate free space and guarantee LUN availability.
The SnapRestore feature enables you to use Snapshot copies to recover data quickly. Entire volumes, individual files, and LUNs can be restored in seconds, regardless of the size of the data.
SnapRestore is a licensed feature that must be enabled before it can be configured and used. license add
NOTE: The SnapRestore feature is licensed system-wide. Therefore, the feature cannot be enabled or disabled at a per-volume level.
The only prerequisite for using the SnapRestore feature (other than licensing) is the existence of Snapshot copies. The SnapRestore feature restores data from Snapshot copies. Snapshot copies that you have not created or retained cannot be used to restore data.
The SnapRestore feature is an extension of the snap command. The command, when used with the restore parameter, can restore an entire volume or an individual file or LUN from a Snapshot copy.
snap restore –t vol –s
– This command reverts the entire volume back to exactly how it was when the Snapshot copy was created.
– Be aware that all subsequent Snapshot copies are deleted.
snap restore –t file –s
– This command reverts an individual file back to exactly how it was when the Snapshot copy was created.
– To recover to a file name or directory location other than the original file name or directory location, add the –r parameter.
NOTE: You can run the SnapRestore command (volume or file) only on a volume that is online.
The SnapRestore feature recovers only volume and file content. It does not recover the following settings:
Snapshot copies schedule
Volume option settings
RAID group size
Maximum number of files per volume
NOTE: The volume SnapRestore command reverts the entire active file system (AFS) back to the point at which the Snapshot copy was created. All Snapshot copies that were created between the time that the Snapshot backup copy was created and the time that the Snapshot backup copy was used to restore the AFS are deleted. When using the SnapRestore feature, be very careful! You cannot back out of your changes.
Using the SnapRestore feature to restore one file may impact subsequent snapshot delete performance. Before a Snapshot copy is deleted, the active maps across all Snapshot copies must be checked for active blocks that are related to the restored file. This performance impact may be visible to the hosts that access the controller, depending on the workload and scheduling.
After you perform a SnapRestore operation (at the volume or file level), the file system metadata, such as security settings and timestamps, are reverted to exactly what they were when the Snapshot copy that was used to perform the restore was created.
Security settings: The security settings of the file have been reverted to their earlier values. If you suspect that the revision may have created a problem, you should review the security settings.
File timestamps: After reversion, the file timestamps are invalid for incremental backups. If you are using a third-party backup tool, so you should run a full backup.
Virus scanning: If a virus-infected file was captured in the Snapshot copy, it is restored in its infected state (whether or not it was cleaned after the Snapshot copy was created). You should schedule a virus scan on any recovered file or volume.
Because the volume remains online and writeable during the SnapRestore activity, there is always the possibility that users may access files on the volume as the restore process is in progress. This overlap can cause file corruption and can generate NFS errors such as ?stale file handle.? There are several methods of avoiding or correcting such issues:
Disconnect the users before you begin the SnapRestore operation
Have the users re-open files that might present a problem
The SnapRestore destination volume cannot be a SnapMirror destination volume. If you want to restore a SnapMirror destination volume, then you should use the FlexClone® feature, which must be licensed, to link the destination‘s Snapshot copy to a new writable volume.
The SnapMirror product family enables you to replicate data from one volume to another volume or, typically, from a local controller to a remote controller. Thus, SnapMirror products provide a consistent, recoverable, offsite disaster-recovery capability.
SnapMirror is a licensed feature that must be enabled before it can be configured and used. The SnapMirror feature actually has two licenses. The first is a for-charge license that provides the asynchronous replication capability, and the second is a no-charge license that provides the synchronous and semi-synchronous capabilities. The no-charge license is available only if the for-charge license is purchased.
First, license the SnapMirror Async function.
Then, license the SnapMirror Sync and SnapMirror Semi-Sync functions (if required).
The no-charge SnapMirror Sync license code is printed in the Data ONTAP Data Protection Online Backup and Recovery Guide.
NOTE: Some older NetApp controllers (FAS820 and prior) cannot support the SnapMirror Sync function.
NOTE: The SnapMirror feature must be licensed on both the source and the destination systems (for example, production and disaster recovery systems).
By default, the SnapMirror feature uses a TCP connection to send the replication data between the two controllers. The TCP connection is usually over an Ethernet or TCP WAN link and is usually the most cost-effective transport. However, customers with access to inter-site Fibre connections can install the model X1024 FC adapter and replicate across the optical media.
The second step (after licensing) in configuring a volume SnapMirror relationship is to create the destination volume. The source and destination volume may be located on the same controller (for data migration) or on different controllers (for disaster recovery).
To create a restricted mode destination volume, run the following commands on the destination system:
vol create (with parameters to suit)
To check the volume‘s status and size, run the vol status –b The volume must be online but in a restricted state to initialize a volume SnapMirror relationship.
NOTE: For a qtree SnapMirror relationship, the destination volume remains in an online and writeable state (not restricted) and the destination qtrees are created automatically when the baseline transfer is performed.
You need to know what the requirements and states of the source and destination volumes are and to understand how the requirements and states of volume SnapMirror relationships and qtree SnapMirror relationships can differ.
FIGURE 12 :SNAPMIRROR VOLUME AND QTREE CONFIGURATION
NOTE: If a volume SnapMirror relationship is stopped (broken or released), the destination volume changes to a writable state, and the fs_size_fixed parameter is enabled on the volume. These actions prevent the inadvertent resizing of the destination volume. Resizing can cause problems when (if) the relationship is resynchronized.
Before you can enable a SnapMirror relationship, you must configure the SnapMirror access control between the primary and secondary storage controllers. For a description of the required settings, refer to the Security section.
After the source and destination volumes are defined, you can configure the SnapMirror relationship. As you configure the relationship, you also perform the initial baseline transfer, copying all of the data from the source volume to the destination volume.
snapmirror initialize –S src:vol1 dst:vol2
When the baseline transfer is completed, the destination volume is an exact replica of the source volume (at that point in time).
Next you must configure the ongoing replication relationship. This relationship controls the mode and/or the schedule for replication of the changed data from the source volume to the destination volume. The SnapMirror replication parameters are defined in the snapmirror.conf file, as shown in Figure 19:
# Source Destination Options Mode/Schedule
src:/vol/vol1/q1 dst:/vol/vol1/q1 – 15 * * *
src:vol2 dst:vol2 – 10 8,20 * *
src:/vol/vol3 dst:/vol/vol3 – sync
src:vol4 dst:vol4 – semi-sync
FIGURE 13: EXAMPLE SNAPMIRROR.CONF FILE
NOTE: The snapmirror.conf file is configured on the destination controller.
As shown in Figure 19, the SnapMirror relationship can operate in any of three modes, performing asynchronous, synchronous, or semi-synchronous replication.
– Snapshot copies are replicated from a source volume or qtree to a destination volume or qtree.
– The host receives acknowledgment after the write is committed to the source volume,
– Block-level, incremental updates to the destination volume are based on schedules.
– The following is a VSM example: src:vol2
dst:vol2 – 10 8,20 * *
– The following is a QSM example:
src:/vol/vol1/q1 dst:/vol/vol1/q1 – 15 * * *
– Writes are replicated from the source volume to the destination volume at the same time that they are written to the source volume.
– The host receives acknowledgment only after the write is committed to both the source and destination volumes.
– The following is an example command: src:/vol/vol1/q1
dst:/vol/vol1/q1 – sync
– Writes are replicated from a source volume or qtree to a destination volume or qtree with minimal delay.
– The host receives acknowledgment after the write is committed to the source volume.
– Performance with minimal delay minimizes the performance impact on the host system.
– The following is an example of the previously used syntax:
src:vol1 dst:vol1 outstanding=5s sync
– The following is an example of the current syntax:
src:vol1 dst:vol1 – semi-sync
NOTE: For descriptions of the various replication options (such as schedule definitions or throughput throttling), refer to the product documentation.
It is possible to configure the SnapMirror feature to use two redundant data paths for replication traffic. The paths can be TCP or FC connections or a mixture of TCP and FC connections. The paths are configured in the snapmirror.conf file. The following key words are used.
Multiplexing: Both paths are used at the same time for load balancing.
Failover: The first path that is specified is active. The second path is in standby mode and becomes active only if the first path fails.
NOTE: Editing the /etc/snapmirror.conf file on the destination causes an in-sync relationship to fall temporarily out-of-sync.
A SnapMirror relationship can be administered from either the source or the destination system, although some functions are available only on their respective systems.
You use the snapmirror status command to display the state of the currently defined SnapMirror relationships, as shown in Figure 20:
FIGURE 14: EXAMPLE OF SNAPMIRROR STATUS OUTPUT
The Lag column identifies the amount of time that has elapsed since the last successful replication of the Snapshot source-volume copy that is managed by the SnapMirror relationship.
The snapmirror command is used to manage all aspects of the SnapMirror relationship, such as suspending or restarting the replication or destroying the relationship. The following are examples of common snapmirror functions:
– The command is executed on the destination system. The command temporarily pauses the replication. The destination volume remains read-only.
– The relationship is still defined and can be resumed.
– The command is executed on the destination system.
– The command resumes the volume replication.
– The command is executed on the destination system. The command stops the replication and converts the destination volume to a writable state.
– The relationship is still defined and can be resynchronized.
– The command identifies the most recently created Snapshot copy that is common to the source and destination volumes and that is managed by a SnapMirror relationship and re-synchronizes the data between the source and destination volumes.
– The direction of synchronization is determined by whether the command was executed on the source or destination volume. The synchronization overwrites the new data on the controller on which the command was executed (bringing the execution-controller volume back into sync with the opposite volume).
– If the command is executed on the destination system, then the relationship continues in its original
direction (source ⇒ destination)
– However, if the command is executed on the source system, then the relationship reverses its original direction (destination source).
– The command is executed on the source system. The command stops the replication and converts the destination volume to a writable state.
– The relationship is deleted and cannot be restarted.
– The command is executed on the destination system.
– The command performs an immediate update from the source volume to the destination volume.
The process of capturing consistent Snapshot copies on the source volume and then transferring the copies to the destination system varies, depending on your application‘s capabilities, the use of SnapDrive and SnapManager software, the replication mode, and the intended result. The following is one example of a process to create a consistent Snapshot copy at the destination of a qtree SnapMirror relationship:
Make the source volume consistent on disk.
– Halt the application.
– Flush the file system buffers.
Quiesce the SnapMirror relationship.
Create a Snapshot copy of the destination volume.
Resume the SnapMirror relationship.
NOTE: In environments that use SnapManager software, the Snapshot copy and replication process is usually automated via the SnapManager utility. In this case, the process can be performed with no disruption to the application.
One of the challenges in a new SnapMirror configuration is the transfer of the baseline copy from the source to the destination system. Although the WAN connection may be adequate to handle the incremental synchronization traffic, it may not be adequate to complete the baseline transfer in a timely manner. In this case, you might consider using the SnapMirror to Tape function. This method can use physical tape media to perform the initial baseline transfer. In Data ONTAP 8.0 7-Mode, this functionality is now supported with the new smtape commands.
After the initial baseline transfer is completed, the incremental synchronization occurs. The initial baseline transfer is usually constrained by the bandwidth of the connection, and the incremental synchronization is usually constrained by the latency of the connection.
The appropriate choice of SnapMirror mode (synchronous, semi-synchronous, or asynchronous) is often driven by the latency of the WAN connection. Because latency increases over distance, latency effectively limits the synchronous mode to a range of less than 100 km. If you require a ?sync-like? replication feature beyond 100 km or want to reduce the performance impact on the source system, then you should consider using the semi-synchronous mode.
In contrast, the asynchronous mode uses scheduled replication and is not affected by connection latency. One way to improve asynchronous performance is to increase the interval between the replication times. This increase allows for ?file system churn.? Data is rewritten throughout the day, but only the latest version is included in the less frequent replication schedules.
In contrast to flexible volumes, the physical characteristics of traditional volumes affect SnapMirror performance. When you use traditional volumes, for best SnapMirror performance, you should configure the source and destination volumes with the same RAID size, RAID group size, and number of RAID groups.
The visibility_interval parameter controls the apparent performance of the SnapMirror synchronization. The parameter controls the view of the data on the destination system. Even after the data is received, the destination file-system view is not updated until the visibility interval elapses. The default visibility interval is three minutes, with a minimum setting of 30 seconds. Reducing the internal is not recommended because deviation from the default value can have a detrimental impact on controller performance.
NOTE: It is possible to throttle the SnapMirror traffic so as to reduce its impact on other applications that use the WAN connection.
Before you can enable the replication relationship, you must configure the SnapMirror access control between the source and destination storage controllers.
The source controller needs to grant access to the destination controller so that the destination controller can
?pull? updates from the source. And the destination controller needs to grant access to the source controller so that the replication relationship can be reversed after a disaster event is resolved (synchronizing back from the disaster recovery site to the production site).
There are two ways to configure SnapMirror access control on the storage controllers:
options snapmirror.access host=legacy
– Edit the /etc/snapmirror.allow file.
– Add the other storage controller‘s host name.
options snapmirror.access host=
– By default, this option is set to the keyword ?legacy.? Use of the keyword causes the system to refer to the snapmirror.allow file for access control.
– Alternatively, you can set the option to host=, to enable the SnapMirror relationship to be accessed from the remote controller.
NOTE: Method 2 is the preferred way to enable the remote access.
The traffic between the source and destination controllers is not encrypted. In a security-conscious environment, it may be necessary to implement some type of network-level encryption for the replication traffic (for example, to use the NetApp DataFort™ encryption devices).
The DataFort security system is designed to encrypt data-at-rest, not data-in-transit. An encrypting Ethernet switch is used to encrypt data-at-rest.
Comprehensive logging of all SnapMirror activity is enabled by default. The log file is saved to the /etc/log/snapmirror.[0-5] file(s). The log can be disabled by executing the following command:
options snapmirror.log.enable [on|off]. The snapmirror status command displays the current status of all SnapMirror relationships. Some status information is available only on the destination controller.
FIGURE 15: SAMPLE OUTPUT OF SNAPMIRROR STATUS
A SnapMirror relationship passes through several defined stages as it initializes the baseline transfer (level-0), synchronizes data (level-1), and possibly reestablishes a broken mirror. The details of the process of troubleshooting and rectification are determined by the stage that the relationship was in when the failure occurred. For example, if communications failed during the initial baseline transfer, then the destination is incomplete. In this case, you must rerun the initialization, rather than trying to re-establish synchronization to the incomplete mirror.