How to create many ifcfg-ethN files in one shot

I had a quick task to create 254 ifcfg-eth1:N configuration files for a Linux installation and certainly spending the whole day doing that manually wasn’t part of my plan. Here is a quick and dirty one-liner I used for that:

# for f in `seq 2 254`; do cat ifcfg-eth1 | \ 
sed -e 's/DEVICE=.*$/DEVICE=eth1\:'$f'/' \
-e 's/IPADDR=.*$/IPADDR=173.xxx.xx.'$f'/' \
-e 's/ONBOOT=.*$/ONPARENT=yes/' > ifcfg-eth1\:$f; done

Questions from Yandex

I have prepared a translation of all questions from the second round of the competition organized by Yandex and would like to invite everyone to test your knowledge. Keep in mind, that those who participated in the game had only four minutes to answer each question. Btw, if you’re looking for a proficient Linux SA, these questions are really good for a job interview.

  1. Imagine you have a server with two configured ethernet interfaces: eth0 – ip 1.2.3.4, netmask /25, gateway 1.2.3.1 and eth1- ip 5.6.7.8, netmask /25, gateway 5.6.7.1. And there are the clients who work with this server but all of them are outside of these two mentioned networks. Your task is to configure the server so that the response packets from the server will be sent through the same interface the respective request has come in. Write the sequence of commands that will make it possible. You should use static routing to accomplish this task.
  2. In vi or vim editor, how would you replace the first matching sequence of digits on every line of a text file, if the sequence starts with 3 and ends with 6 (there could be one or two additional numbers between them), with a string that consists of 444?
  3. You’ve got a server with RAID10 built using mdam across four disks. The following options were used to construct the software array:
    near=2
    far=1
    Disks’ numeration starts with zero. What is the maximum number of disks that one could pull out from this software raid without affecting data consistency? Write down the number of these disks.
  4. You have tasks that require maximum possible CPU power from your hardware. But you suspect that governor at times lows down the CPUs’ frequency without any good reason for doing that. Provide the commands that could be used to force the CPUs to run at maximum frequency level without rebooting the server and without installing additional tools or software utilities.
  5. You’ve got a server with Linux OS installed with 1GB of RAM. There is a currently running process with code segment of 500MB in size. You’re about to start the second instance of the same process. The memory pages have been marked as following:
    1 – code segment of the first process;
    2 – data segment of the first process;
    3 – code segment of the second process;
    4 – data segment of the second process;

    Please, list all type of memory pages, mentioned above, that could find their way into the swap.

  6. There is a server and all that is known about it is that it has a SATA disk with ext3 file system, 32GB of RAM, no swap and you couldn’t turn the swap on.
    Two commands were executed simultaneously:
    dd if=/dev/zero of=/opt/testfiles/tesfile1 bs=1G count=64
    and
    dd if=/dev/zero of=/opt/testfiles/tesfile2 bs=1G count=64.

    In a few minutes, the server begins to waist most of its CPU time in iowait and in the end, after several more minutes, the server crashes with OOM kernel panic.
    It’s known that if you run:
    dd if=/dev/zero of=/opt/testfiles/tesfile1 bs=1G count=18
    and
    dd if=/dev/zero of=/opt/testfiles/tesfile2 bs=1G count=18

    the aforementioned situation couldn’t be reproduced. How would you fix and overcome this problem without limiting the disk’s subsystem and without changing any hardware parameters?

  7. You’ve got two servers S and R. A web site is available on server S (IP address 192.168.1.2) on port 80 and this site is unreachable from the outside network.
    Server S uses R as a default gateway. Server R has its public interface configured with 1.2.3.4 IP address and to allow the remote clients to reach the web site on server S, DNAT
    has been configured on server R. DNAT: 1.2.3.4:80->192.168.1.2:80. You are trying to verify the configuration from the server S by using the following telnet command:
    telnet 1.2.3.4 80 but the connection couldn’t be established. Why?
  8. You had configured RAID 0 using madam and four disks. At some point, a member of the cleaning company had accidentally pulled the power cord and the server went down. When you boot up the server you noticed that the raid is totally degraded (mdam can’t see any superblocks). How would you restore the data and reconstruct the raid. Provide the commands you’d use to do that?
  9. Suddenly, the load on your Linux server increased. After you logged into the server you’d noticed that out of 1000 inbound connections 990 were coming from a single ipv6 address ::ffff:1.2.3.4. You assumed that it was a DOS-attack and decided to block this address permanently. What command would you use to achieve that?
  10. You have three web sites (forbar.tj, foo.ec and bar.ag) sharing one IP address (1.2.3.4) and available through https. How many certificates do you need for these sites and what should be stored in those certificates according to RFC, so the web browsers could trust them?
  11. Someone has initiated a fork-bomb on your server. Thankfully, you have an opened ssh-session into this server. But there is a problem: the server is enormously overloaded so you can’t even run a “ps” since the server has run out of all available PIDs. The parent and the child processes of this fork-bomb are all called someprogram.bin. How would you kill all these processes (someprogram.bin)?
  12. Give an example of a command, that would allow to configure routing under Linux from source-address 192.168.1.1/24 into the network 192.168.2.0/24 using 192.168.1.128 and 192.168.1.253 as a gateways. But, if anyone of them is inaccessible then the traffic should automatically switch to use the other available router.
  13. You’re using etherchannel across two network ports and a hashing algorithm based on MAC-addresses. What would be a ration of incoming load in etherchannel if there is an odd number of clients connected to a switch which generate the load?
  14. You have a running application that stores all its data in a single file – /var/spool/veryimportantinformation.dat. You’ve just inadvertently deleted this file and there is no other copy or a backup available. The application is still running. How would you recover that file?
  15. You’ve got the following records in your DNS zone footer.tj:
    a IN A 1.2.3.4
    b CNAME a
    MX 10 mail.foo.ec
    g IN A 1.2.3.2
    * IN CNAME b

    Do you think you need to apply any changes to the zone to be able to receive email for z.foobar.tj by mail.foo.ec mail server?

Answers
Bellow are my answers that surely should be taken with a grain of salt and if you notice an error or there is something you’d like to add then, please, don’t hesitate to do so.

  1. First create two additional tables:
    # echo "100 1234" >>/etc/iproute2/rt_tables
    # echo "101 5678" >> /etc/iproute2/rt_tables
    # ip route add 1.2.3.0/25 dev eth0 src 1.2.3.4 table 1234
    # ip route add default via 1.2.3.1 dev eth0 table 1234
    # ip route add 5.6.7.0/25 dev eth1 src 5.6.7.8 table 5678
    # ip route add default via 5.6.7.1 dev eth1 table 5678
    

    Next add iproute rules:

    # ip rule add from 1.2.3.1 table 1234
    # ip rule add from 5.6.7.1 table 5678
    

    Next setup the main routing table:

    # ip route add 1.2.3.0/25 dev eth0 src 1.2.3.4
    # ip route add 5.6.7.0/25 dev eth1 src 5.6.7.8
    
  2. :%s/^3\([0-9][0-9]\|[0-9]\)6$/444/g
  3. Three disks could be removed from the raid. Assume that we have sda1, sdb1, sdc1 and sdd1.
    With near=2 and far=1 we would have the following block layout:

    sda1  sdb1  sdc1  sdd1
    ---------------------
     1     1     2     2
     3     3     4     4
     .     .     .     .
     .     .     .     .
     2           1
     4           3
    

    Hence we could safely remove disks sdb1, sdc1 and sdd1 or sda1, sdb1 and sdd1. In the first case sda1 will have all the data in the later one disk sdc1 would save us.

  4. Obviously, “performance” governor would help. Let’s check if this type of governor is available:
    # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
    conservative ondemand userspace powersave performance
    

    If you don’t see “performance” in the output on your server then you’ll have to add it to your running system:

    # cd /lib/modules/[kernel version]/kernel/drivers/cpufreq/
    # modprobe cpufreq_peformance
    

    Now, enable the new governor:

    # echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
    

    Since “performance” governor forces the CPU to use the highest possible clock frequency this frequency will be statically set, and will not change. So no additional tweaking is required.

  5. Depending on how well or badly this program has been written both of these processes equally might or might not be swapped out. My apprehension is the following. Since we’re about to start the second copy of the process that is already being executed their code and data segments would be shared. That means that there would a neglectable impact on the memory subsystem and everything is supposed to be nice and dandy if and only if the code has been written in a sane manner. If not and the process would try to consume memory resources in an uncontrollable manner, i.e. by doing a lot of malloc() and forgetting about doing free(), then I wouldn’t be surprised to see both of these processes swapped out. Whilst the active one would struggle to get
    the pages it needs from swap back into memory, the kernel would continue to unmap mapped pages depending on the swap_tendancy parameter (swap_tendency = mapped_ratio/2 + distress + vm_swappiness). Lately, once the second process would run out if its CPU shares, the first process would occupy CPU and the whole story would repeat.
  6. Since we know that it’s safe to run dd if=/dev/zero of=/opt/testfiles/tesfile1 bs=1G count=18 and dd if=/dev/zero of=/opt/testfiles/tesfile2 bs=1G count=18, we could do the following:
    # for i in `seq 1 4`; do \
    dd if=/dev/zero of=/opt/testfiles/tesfile1 oflag=append conv=trunc bs=1G count=18 && \
    dd if=/dev/zero of=/opt/testfiles/tesfile2 oflag=append conv=trunc bs=1G count=18; \
    done
    

    Another option, though I didn’t check it myself, is to use pipe to avoid unwanted caching:

    # dd if=/dev/zero bs=1G count=64 | dd of=/opt/testfiles/tesfile1 bs=1G count=64
    # dd if=/dev/zero bs=1G count=64 | dd of=/opt/testfiles/tesfile2 bs=1G count=64
    
  7. The connection couldn’t be established because server S is trying to connect to 1.2.3.4 but due to the presence of DNAT the server S will never receive a response from 1.2.3.4 IP. Let’s look at the issue more thoroughly:

    • S sends a SYN packet to 1.2.3.4 but since the destination is out of its network, S would ask R for a help. So, in IP header there would be a Source address – 192.168.1.2 and in Destination address – 1.2.3.4
    • Let’s assume that R has a private IP 192.168.1.1 and it’s in the same subnet as server S. So server R would receive an Ethernet frame with encapsulated TCP/IP information.
    • Since DNAT works before the routing decision is taken, the destination address would be replaced according to the DNAT rule to 192.168.1.2.
    • From here, server R knows that 192.168.1.2 IP is in a directly attached network and would send the packet with source address 192.168.1.2 and destination address 192.168.1.2.
    • Finally, server S would receive a SYN packet and will try to send a SYN/ACK back to itself. Obviously, it will never receive an ACK from 192.168.1.2 because server S never sends a SYN packet to itself.
    • At the same time, the initial connection from 192.168.1.2 to 1.2.3.4 would time out since server S would never receive a SYN/ACK from 1.2.3.4.

    Here is the network traffic dump I’ve gathered using tshark tool. I used exactly the same configuration but a slightly different IPs. Server S (192.168.1.5), server R (192.168.1.6 and 1.2.3.4).
    So, here is the dump from server S after “telnet 1.2.3.4 80” was executed:

      0.000000  192.168.1.5 -> 1.2.3.4      TCP 39065 > http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=280775 TSER=0 WS=5
      0.001245  192.168.1.5 -> 192.168.1.5  TCP 39065 > http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=280775 TSER=0 WS=5
      3.001775  192.168.1.5 -> 1.2.3.4      TCP 39065 > http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=281526 TSER=0 WS=5
      3.002868  192.168.1.5 -> 192.168.1.5  TCP 39065 > http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=281526 TSER=0 WS=5
      5.008888 CadmusCo_b9:3e:af ->              ARP Who has 192.168.1.5?  Tell 192.168.1.6
      5.008940 CadmusCo_06:16:0e ->              ARP 192.168.1.5 is at 08:00:27:06:16:0e
      9.009453  192.168.1.5 -> 1.2.3.4      TCP 39065 > http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=283028 TSER=0 WS=5
      9.011750  192.168.1.5 -> 192.168.1.5  TCP 39065 > http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=283028 TSER=0 WS=5
    

    And here how the traffic flow from the server R:

      0.000000  192.168.1.5 -> 1.2.3.4      TCP 39065 > http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=280775 TSER=0 WS=5
      0.000199  192.168.1.5 -> 192.168.1.5  TCP 39065 > http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=280775 TSER=0 WS=5
      3.001751  192.168.1.5 -> 1.2.3.4      TCP 39065 > http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=281526 TSER=0 WS=5
      3.001790  192.168.1.5 -> 192.168.1.5  TCP 39065 > http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=281526 TSER=0 WS=5
      5.007598 CadmusCo_b9:3e:af ->              ARP Who has 192.168.1.5?  Tell 192.168.1.6
      5.008569 CadmusCo_06:16:0e ->              ARP 192.168.1.5 is at 08:00:27:06:16:0e
      9.010672  192.168.1.5 -> 1.2.3.4      TCP 39065 > http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=283028 TSER=0 WS=5
      9.010712  192.168.1.5 -> 192.168.1.5  TCP 39065 > http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=283028 TSER=0 WS=5
    
  8. As far as I know mdadm is smart enough and if one tries to recreate an array using the same disks it wouldn’t delete the previously stored data but create a new one in a none-destructive way.

    # mdadm --create /dev/md0 --verbose --level=0 \ 
    --raid-devices=4 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
    

    If you still can’t see /dev/md0 try to assemble the raid:

    # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
    
  9. It’s an example of IPv4 address mapped over IPv6 thus ip6tables could help to deal with it:

    # ip6tables -s ::ffff:1.2.3.4 -j REJECT --reject-with icmp6-adm-prohibited
    
  10. One certificate is required with subjectAltName extension of type dNSName. Also take a look at RFC2818 called “HTTP Over TLS”
  11. I’d try to do the following first:

    # killall -s STOP someprogram.bin
    # killall -s KILL someprogram.bin
    

    If the above doesn’t work, which is quite possible since we’ve run out of PIDs, I’d suggest to do the following:

    # for f in `dir /proc`; do if [[ -d /proc/$f && -f /proc/$f/exe ]]; then dir -l /proc/$f/exe; fi; done
    

    This one-liner would list all running processes, so we could find out the PID of someprogram.bin processes and use built-in bash kill tool:

    # kill -s KILL someprogram.bin's pid
    
  12. This is another task for iproute2.
    # echo "200 net_128 >> /etc/iproute2/rt_tables 
    # ip route add 192.168.2.0/24 via 192.168.1.128 table net_128
    # ip route add 192.168.2.0/24 via 192.168.1.253 metric 10 table net_128
    # ip rule add from 192.168.1.1/24 to 192.168.2.0/24 table net_128
    

    It’s a good practice to update gc_timeout too:

    # sysctlt -w net.ipv4.route.gc_timeout=10
    
  13. Don’t have an answer for this question, probably (4:4). Well, I’m aware that the maximum number of ports in etherchannel is 8 and that algorithm generates three bits, in our case from MAC address, to determine which port in the channel is used to forward the packet. Since there are only two ports then if the hash algorithm returns the bit 000, 001, 010 or 011 that would indicate that a packet would be sent through the first port. If 100, 101, 110 or 111 then the second port would be used.
  14. To recover a file just get the PID of the process. Run lsof to find out the file descriptor number and use it later to restore the file:

    # lsof -p proc_pid | grep veryimportantinformation.dat
    # cd /proc/you_proc_pid/fd
    # cat fd_number > /var/spool/veryimportantinformation.dat
    # restart your application
    
  15. If you want to send an email to address@z.foobar.tj you could basically do two things:

    • Create a separate zone for z.foobar.tj
    • Add another MX record just for z.foobar.tj:
      z.      IN    MX 10 mail.foo.ec
      

Orphaned Dtrace, Fishworks and ZFS

First it was Bryan Cantrill and then Adam Leventhal who followed. After that the exodus had continued by Jeff Bonwick and Mike Shapiro both leaving Oracle. But today another big name from Sun Microsystems has closed the Oracle’s door – Brendan Gregg is leaving today and all we have been left with is a new Dtrace book from Brendan and Jim Mauro:

Enjoy the videos.

Update
Below is the list, taken from OpenSolaris mailing list, of all big names that have abandoned Sun/Oracle so far:

  • Ian Murdock (Emerging systems, i.e. new distro architecture)
  • Tim Bray (SGML/XML) (1 March 2010)
  • Simon Phipps (Open Source) (March 2010)
  • James Gosling (Java) (2 April 2010)
  • Sunay Tripathi (CrossBow) (April 2, 2010)
  • Garrett D’Amore (networking, audio, device drivers – formerly with General Dynamics (which had bought Tadpole)
  • Bryan Cantrill (DTrace) (July 2010)
  • Adam Leventhal (DTrace)
  • Jeff Bonwick (ZFS)
  • Michael W. Shapiro (dTrace, storage) (October 2010)
  • Brendan Gregg (dTrace, storage) (October 2010)

Game over

As expected I’ve miserably failed in the second round and had to rush home from disturbing and drizzling railway station’s environment. My only hope is to receive the list of all questions since the ones I’ve seen looked very interesting and I’d love to answer them just for myself.

Challenge goes on…

Yes! I did it and have made my way into the second round of Linux competition presented and organized by Yandex. Unfortunately, due to an upcoming trip to Nizhniy Novgorod I have serious doubts I would be able to participate since my train will bring me back to Moscow five minutes after of the start of the game. Hopefully, I could use WiMax to jump into the fray right from the railway station sitting on the platform near the tracks. Anyway, it should be fun and in the end hardship makes us stronger. Peace…

How to replace a PCIE card in Sun Cluster

I’ve just returned from a quick trip to Nizhniy Novgorod where I had to replace a PCIE card, it was an ethernet adapter (e1000g). Not a big deal, sure, but the tricky part lied in the fact that this card was a part of Sun Cluster setup so I had deliberately chosen a slippery path called DR or dynamic reconfiguration. Actually, it’s not that difficult at all as it might seem at first. Let’s make the long story short…
In my case it was in Intel-based ethernet adapter (e1000g) with two ports: one for the public network (e1000g2) and the other for the private interconnect between clusters (e1000g3). To be able to DR the adapter it must be properly unconfigured and taken out of the OS’s control. Otherwise cfgadm would complain saying the device is busy and won’t let you to remove the card.

  1. First, lets migrate the public IP address to another interface frm the same IPMP group.
    # if_mpadm -d e1000g2
    
  2. Next, take the public interface out from IPMP group. And don’t forget to unmplub it – it’s mandatory.
    # ifconfig e1000g2 group ""
    # ifconfig e1000g2 unplumb
    
  3. Make a record of the current cluster interconnect configuration:
    # clintr show
    # clintr status
    
  4. Disable the cluster’s interconnect cable:
    # clintr disable node1:e1000g3,node2:e1000g3
    
  5. Now you could safely tamper with the PCIE card through the list of the well-know Solaris commands:
    # cfgadm -c unconfigure iou#1-pci#3
    # cfgadm -c disconnect iou#1-pci#3
    
  6. Physically replace the card and add it back into the Solaris OS:
    # cfgadm -c connect iou#1-pci#3
    # cfgadm -c configure iou#1-pci#3
    
  7. Add back the previously removed interconnect cable:
    # clintr enable node1:e1000g3,node2:e1000g3
    
  8. If required plumb and failover the IP for the second port (e1000g2). In my case though, that was done automatically as soon as I configured the PCIE card. Just run the following commands if you have to do that manually. I used sc_ipmp0 as a IPMP group name but in your case it could be different.
    # ifconfig e1000g2 plumb
    # ifconfig e1000g2 group sc_ipmp0
    # if_mpadm -r e1000g2 

Hope it would help someone.
Cheers.