tmux quick usage

session management

tmux ls (or tmux list-sessions)
tmux new -s session-name
Ctrl-b d Detach from session
tmux attach -t [session name]
tmux kill-session -t session-name

Ctrl-b c Create new window
Ctrl-b d Detach current client
Ctrl-b l Move to previously selected window
Ctrl-b n Move to the next window
Ctrl-b p Move to the previous window
Ctrl-b & Kill the current window
Ctrl-b , Rename the current window
Ctrl-b q Show pane numbers (used to switch between panes)
Ctrl-b o Switch to the next pane
Ctrl-b ? List all keybindings

moving between windows

Ctrl-b n (Move to the next window)
Ctrl-b p (Move to the previous window)
Ctrl-b l (Move to the previously selected window)
Ctrl-b w (List all windows / window numbers)
Ctrl-b window number (Move to the specified window number, the
default bindings are from 0 — 9)

Tiling commands

Ctrl-b % (Split the window vertically)
CTRL-b ” (Split window horizontally)
Ctrl-b o (Goto next pane)
Ctrl-b q (Show pane numbers, when the numbers show up type the key to go to that pane)
Ctrl-b { (Move the current pane left)
Ctrl-b } (Move the current pane right)

Make a pane its own window

Ctrl-b : “break-pane”

add to ~/.tmux.conf

bind | split-window -h
bind – split-window -v

linux可用内存使用分析问题

服务器01内存MEMAppUsablePerc告警,当时的现场信息如下:
服务器01使用free查看内存使用率,可以看到算上buffers和cache,内存还剩余了2.7G,但是patrol上显示可用MEMAppUsablePerc只有5% :
pacpgwq01:~ # free -m
total used free shared buffers cached
Mem: 7867 7705 162 0 136 2459
-/+ buffers/cache: 5109 2758
Swap: 0 0 0

服务器02使用free查看内存使用率,算上buffers和cache,内存还剩余了2.1G,与patrol上显示可用MEMAppUsablePerc为25%,基本相符:
pacpgwq02:~> free -m
total used free shared buffers cached
Mem: 7867 7732 134 0 32 1937
-/+ buffers/cache: 5762 2104
Swap: 0 0 0

为什么free –m的内容差不多,patrol差异这么大?

后来跟监控组确认MEMAppUsablePerc的计算方法,其计算方法为:Patrol KM读取/proc/meminfo的字段,
MEMAppUsablePerc(内存可用率)=100*(MemFree+Buffers+Cached)/MemTotal
当时的相关信息如下:
pacpgwq01:/proc/sys/vm # cat /proc/meminfo
MemTotal: 8055952 kB
MemFree: 2844140 kB
Buffers: 1328 kB
Cached: 31888 kB
SwapCached: 0 kB

可以看到其cached为31M左右,与free –m看到2.4G的差别很大。

而pacpgwq02的信息如下,与Patrol基本一致,MEMAppUsablePerc为20%左右。
pacpgwq02:~ # cat /proc/meminfo
MemTotal: 8055952 kB
MemFree: 133852 kB
Buffers: 92856 kB
Cached: 1786256 kB
为什么会出现free –m与/proc/meminfo信息不一致的这个问题,后来查询相关资源发现:
SUSE之类的系统则认为:可用内存=free的内存+cached的内存+buffers的内存+SReclaimable的内存(SReclaimable为操作系统内存管理slob缓存的系统内核信息)
检查pacpgwq01的SReclaimable大小,发现有1.9G左右。
pacpgwq01:/usr # cat /proc/meminfo
MemTotal: 8055952 kB
MemFree: 158948 kB
Buffers: 145540 kB
Cached: 526864 kB

Slab: 2078612 kB
SReclaimable: 1967368 kB

调用以下命令调整内核参数,对文件环境、kernel缓存进行清理后,Cached、SReclaimable降低,MemFree显著增加,Patrol监控恢复正常。
pacpgwq01:/usr # echo 3 > /proc/sys/vm/drop_caches
pacpgwq01:/usr # cat /proc/meminfo
MemTotal: 8055952 kB
MemFree: 2731032 kB
Buffers: 1192 kB
Cached: 25604 kB
SwapCached: 0 kB

Slab: 150672 kB
SReclaimable: 39456 kB

以为这个问题就这么解决了,但是后续观察又发现运行一段时间后,pacpgwq01的SReclaimable值又增加了到1.3G,而没有执行任何操作的pacpgwq02其SReclaimable一直为180M左右。

那么SReclaimable为什么增长呢?slaptop发现,其主要为进程inode缓存proc_inode_cache(1G左右),其次为目录结构缓存dentry(300M),
pacpgwq01:/usr # slabtop
Active / Total Objects (% used) : 3723187 / 3809628 (97.7%)
Active / Total Slabs (% used) : 373224 / 373230 (100.0%)
Active / Total Caches (% used) : 95 / 175 (54.3%)
Active / Total Size (% used) : 1431793.25K / 1443443.42K (99.2%)
Minimum / Average / Maximum Object : 0.02K / 0.38K / 4096.00K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
1566570 1566254 99% 0.64K 261095 6 1044380K proc_inode_cache
1607260 1606750 99% 0.19K 80363 20 321452K dentry
31915 31893 99% 0.80K 6383 5 25532K ext3_inode_cache
121758 113904 93% 0.18K 5798 21 23192K vm_area_struct
2065 2061 99% 6.30K 2065 1 16520K task_struct
6804 5956 87% 1.62K 1701 4 13608K TCP

其有这么多的读写文件相关的缓存,肯定有大量的IO操作导致,需要对OS抓取trace确认哪些程序调用open,stat,close,unlink等操作。

抓取步骤如下:
1.如果问题可以在业务空闲是重现,则最好在业务空闲时执行。
2.请顺序执行如下命令:
# sysctl -w vm.block_dump=1
# echo 3 > /proc/sys/vm/drop_caches
# cat /proc/meminfo
3.等待Slab占用明显升高后,顺序执行如下命令:
# sysctl -w vm.block_dump=0
# cat /proc/meminfo
分析/var/log/messages,找到SReclaimable增加是执行的进程操作,判断哪一个进程导致。

云平台虚拟机修改root口令

1. 停止虚拟机
virsh destroy instance-000007c4
2. 挂载虚拟机的磁盘文件:
qemu-nbd –c /dev/nbd15 /dsx01/instances/instance-000007c4
kpartx –a /dev/nbd15
mkdir –p /tmp/tmpClone
mount /dev/mapper/nbd15p1 /tmp/tmpClone
3. 生成root用户的密码,这里密码就是root,(虚拟机原密码不是root)
openssl passwd -1 -salt $(< /dev/urandom tr -dc '[:alnum:]' | head -c 32) 4. 将生成的字符串替换/etc/shadow文件中root的用户密码(替换整个第二个分段): 保存并退出。 5. 卸载磁盘文件: umount /dev/mapper/nbd15p1 kpartx –d /dev/nbd15 qemu-nbd –d /dev/nbd15 6. 打开虚拟机并登陆 virsh start instance-000007c4 ssh root@$ip

文件系统异常和修复方法

tune2fs -l /dev/mapper/vm_data-LF_CULGPAPS |grep ‘Filesystem state’|egrep -i ‘dirty|error’
如果显示状态为dirty,或者clean with errors。
umount -f /CULGPAPS
fsck /dev/mapper/vm_data-LF_CULGPAPS 对磁盘进行自检操作
自检完毕之后,再次执行tune2fs -l /dev/mapper/vm_data-LF_CULGPAPS |grep ‘Filesystem state’|egrep -i ‘dirty|error’ 确认状态为clean

cd /etc/lvm/backup/
more vm_data 查看备份信息中记录的pv uuid,执行
pvcreate –restorefile vm_data –uuid v7QHCM-f2fM-eRcr-jqnH-e9XO-A760-jY6jCd /dev/sdc
pvcreate –restorefile vm_data –uuid C8RUEv-tChU-RSpN-c2gH-Odq9-F5Cr-td0Mgo /dev/sdd
恢复vg
vgcfgrestore -f vm_data vm_data
激活vg
vgchange -a y
如果再次出现如下的超级块报错:
ppculgp07:/ # mount -av
mount: wrong fs type, bad option, bad superblock on /dev/mapper/vm_data-LF_CULGPAPS,
missing codepage or helper program, or other error
In some cases useful info is found in syslog – try
dmesg | tail or so
请依次执行如下步骤:
导出lv的磁盘信息:
dumpe2fs /dev/mapper/vm_data-LF_CULGPAPS > /tmp/lipeng
在磁盘信息中查看super block的信息,会包括一个主超级块和多个备份超级块的信息,mount文件系统失败原因为主超级块故障导致
cat /tmp/lipeng|grep -i super|grep -i block
ppculgp03:/CULGPAPS/usr # cat /tmp/lipeng|grep -i super|grep -i block
Primary superblock at 0, Group descriptors at 1-5
Backup superblock at 32768, Group descriptors at 32769-32773
Backup superblock at 98304, Group descriptors at 98305-98309
Backup superblock at 163840, Group descriptors at 163841-163845
Backup superblock at 229376, Group descriptors at 229377-229381
Backup superblock at 294912, Group descriptors at 294913-294917
使用备份超级块,开始恢复磁盘信息,注意红字部分为上面查看得到的block id
e2fsck -b 229376 -f -y /dev/mapper/vm_data-LF_CULGPAPS
系统自动根据超级块信息修复,该操作可能耗时很长。执行完毕之后,对于磁盘再次进行自检,确认无问题
fsck /dev/mapper/vm_data-LF_CULGPAPS
自检无问题之后,再次mount文件系统
mount -av
系统自动修复的文件系统,会把信息全部保存在/CULGPAPS/lost+found目录下,注意使用mv命令移动文件夹位置。

A network traffice control script edited for our env

#!/bin/bash
# Borrowed from linux.org/how-to
# Edited by Zengqiang Xie XTS DC UnionPay
# tc uses the following units when passed as a parameter.
# kbps: Kilobytes per second
# mbps: Megabytes per second
# kbit: Kilobits per second
# mbit: Megabits per second
# bps: Bytes per second
# Amounts of data can be specified in:
# kb or k: Kilobytes
# mb or m: Megabytes
# mbit: Megabits
# kbit: Kilobits
# To get the byte figure from bits, divide the number by 8 bit
#
#
# Name of the traffic control command.
TC=/sbin/tc
# The network interface we're planning on limiting bandwidth.
IF=eth0 # Interface
# Download limit (in mega bits)
DNLD=1mbit # DOWNLOAD Limit
# Upload limit (in mega bits)
UPLD=1mbit # UPLOAD Limit
# IP address of the machine we are controlling
IP=0.0.0.0/0 # Host IP
# Filter options for limiting the intended interface.
U32="$TC filter add dev $IF protocol ip parent 1:0 prio 1 u32"

start() {
# We'll use Hierarchical Token Bucket (HTB) to shape bandwidth.
# For detailed configuration options, please consult Linux man
# page.
$TC qdisc add dev $IF root handle 1: htb default 30
$TC class add dev $IF parent 1: classid 1:1 htb rate $DNLD ceil $DNLD
$TC class add dev $IF parent 1: classid 1:2 htb rate $UPLD ceil $UPLD
$U32 match ip dst $IP flowid 1:1
$U32 match ip src $IP flowid 1:2
# The first line creates the root qdisc, and the next two lines
# create two child qdisc that are to be used to shape download
# and upload bandwidth.
#
# The 4th and 5th line creates the filter to match the interface.
# The 'dst' IP address is used to limit download speed, and the
# 'src' IP address is used to limit upload speed.
}

stop() {
# Stop the bandwidth shaping.
$TC qdisc del dev $IF root
}
restart() {
# Self-explanatory.
stop
sleep 1
start
}

show() {
# Display status of traffic control status.
$TC -s qdisc ls dev $IF
}

##main function added by zqxie
case "$1" in
start)
echo -n "Starting bandwidth shaping: "
start
echo "done"
;;
stop)
echo -n "Stopping bandwidth shaping: "
stop
echo "done"
;;
restart)
echo -n "Restarting bandwidth shaping: "
restart
echo "done"
;;
show)
echo "Bandwidth shaping status for $IF:"
show
echo ""
;;
*)
pwd=$(pwd)
echo "Usage: qos.sh {start|stop|restart|show}"
;;
esac
exit 0

linux ip命令

1、ip网卡信息
ip link list
1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT qlen 1000
link/ether 00:16:3e:00:04:b5 brd ff:ff:ff:ff:ff:ff
3: eth1: mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT qlen 1000
link/ether 00:16:3e:00:04:12 brd ff:ff:ff:ff:ff:ff

2、地址信息
ip address show
1: lo: mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
valid_lft forever preferred_lft forever
2: eth0: mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
link/ether 00:16:3e:00:04:b5 brd ff:ff:ff:ff:ff:ff
inet 10.169.16.154/21 brd 10.169.23.255 scope global eth0
valid_lft forever preferred_lft forever
3: eth1: mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
link/ether 00:16:3e:00:04:12 brd ff:ff:ff:ff:ff:ff
inet 121.42.136.161/22 brd 121.42.139.255 scope global eth1
valid_lft forever preferred_ft forever

3、路由信息
ip route show
default via 121.42.139.247 dev eth1
10.0.0.0/8 via 10.169.23.247 dev eth0
10.169.16.0/21 dev eth0 proto kernel scope link src 10.169.16.154
100.64.0.0/10 via 10.169.23.247 dev eth0
121.42.136.0/22 dev eth1 proto kernel scope link src 121.42.136.161
127.0.0.0/8 dev lo scope link
169.254.0.0/16 dev eth0 scope link
172.16.0.0/12 via 10.169.23.247 dev eth0

4、mac 地址信息
ip neigh show
121.42.139.247 dev eth1 lladdr 00:00:0c:9f:f3:20 REACHABLE
10.169.23.21 dev eth0 lladdr 00:16:3e:00:b3:ff STALE
10.169.23.249 dev eth0 lladdr 00:2a:6a:b6:77:3c STALE
10.169.23.248 dev eth0 lladdr 00:2a:6a:b1:a3:7c STALE
121.42.139.249 dev eth1 lladdr 00:2a:6a:b6:77:3c STALE
10.169.23.247 dev eth0 lladdr 00:00:0c:9f:f2:bc STALE
121.42.139.248 dev eth1 lladdr 00:2a:6a:b1:a3:7c STALE

删除 mac 地址
ip neigh delete 9.3.76.43 dev ethic

5、路由规则
ip rule list
0: from all lookup local
32766: from all lookup main
32767: from all lookup default

first local 、then main、then default。

关于 tc 的信息
http://tldp.org/HOWTO/Traffic-Control-HOWTO/