Couchbase初次小测

【注】这次测试中,遇到了一些问题,因此本文用E文编写,随后发到Couchbase社区咨询一下。

We have five nodes as the couchbase cluster, the servers have enough memory and disk, with ubuntu 12.04 OS. Couchbase server version: 2.0.1 community edition (build-170). I use the default bucket for test, it has been assigned 5GB memory totally.

The test Ruby script is as below:

require 'couchbase'
require 'securerandom'

client = Couchbase.connect("http://couch.example.com:8091/pools/default/buckets/default")

100000.times do |s|
value_10k = SecureRandom.hex(5120)
key_uuid = SecureRandom.uuid

begin
client.set key_uuid, value_10k, :ttl => 3600
rescue Couchbase::Error::Base => e
puts e
end
end

As you see that, I tested to write (set) only. Eash “set” is a key UUID with the value of 10K bytes. I run this script from two terminals (so it’s two clients), and run it for many times. From couchbase’s web console, I can monitor the performance status, i.e, the TPS (Ops/sec). A general status is shown as below:

couchbase1

The first column shows it has 5 nodes in the cluster, all are alive. The second column is Item Count, how many items have been stored into the cluster. The third column is Ops/sec, how many operations (the writings) have been executed in a second. The fourth column is Disk Fetches/sec. The fifth column is RAM/Quota Usage, there are 5GB memory totally in the cluster, currently 4.55GB have been used. The last column is Data/Disk Usage, it shows 14.4GB data have been swapped into the disk.

The good point is, when I continuously run the script, the memory never gets full. With the mechanism of ejection, the old data are swapped into the disk. So disk usage is increasing, but memory doesn’t when it reaches the max limit.

When I tested from 3 clients, the screenshot is as:

couchbase2

The TPS is increased from 2K+ to 3K+. The client host’s bandwidth usage:

couchbase3

Here “tx” means “transfer out”, it’s 273 Mb/s.

I didn’t test from more clients, because I got some errors at the time. The clients print out:

failed to store value (key="422c420d-8b1c-49a4-b153-c11fcb0dbdf1", error=0x0b)
failed to store value (key="e9bce12c-a505-409f-a4a2-f0f79862ac67", error=0x0b)
failed to store value (key="6f31c932-0c21-4c76-b5e1-24a3b6320ef1", error=0x0b)
failed to store value (key="e976530c-d26e-4f56-81c0-b9bbf4b9f589", error=0x0b)
failed to store value (key="646c9078-480a-40be-8c08-d8eb50992c60", error=0x0b)
failed to store value (key="93c74fe3-5c2d-437c-99d1-2df34c72d814", error=0x0b)
failed to store value (key="bc5f55c7-82f0-45d6-9444-1859ae126a5b", error=0x0b)
failed to store value (key="5880fe13-8a46-4cb9-82d5-a35201914199", error=0x0b)
failed to store value (key="5dd1a8af-fb0d-411a-9c07-778d888816bf", error=0x0b)
failed to store value (key="d0c3b14d-5c08-470f-af01-ed47c82a2037", error=0x0b)
failed to store value (key="418da974-4349-4552-870b-ff6799a08b6f", error=0x0b)
failed to store value (key="660accb1-1bbf-4312-b375-9e029ba1dfef", error=0x0b)
failed to store value (key="a483fb29-4ec3-4d30-b11e-529b0ebe1a0f", error=0x0b)
failed to store value (key="f2cdfdcf-a11e-4486-a109-aeba4e996b37", error=0x0b)
failed to store value (key="977f6589-6c22-4e36-b062-f13522e6646c", error=0x0b)
failed to store value (key="beff5b41-a61f-408e-a21d-4795de6cdbf5", error=0x0b)
failed to store value (key="d2e307d5-f992-433b-8cf7-42c3ab43a722", error=0x0b)
failed to store value (key="3914e8af-adf9-4556-a16c-7d0c5f0f652b", error=0x0b)

From this url we see that the error code “0x0b” means:

0x0b : LIBCOUCHBASE_ETMPFAIL (Temporary failure. Try again later)

It’s a “Temporary failure” error. Why happens this? I thought it’s maybe caused by disk swapping. When close to out of memory, the cluster swaps the data to disk. From iostat’s output, the disk is busy for writing:

couchbase4

When the frequency of incoming data is beyond of the capability of ejection, the error happens. Though I am not very sure, but I think for the best result, people should keep enough memory in the cluster for the expected data size. But if a temporary failure happens, how will clients handle with it? I expect a good answer.

[Added] Today I got the reply from Matt who has been working for Couchbase. He wrote:

Tempfail can come up during a few different situations, such as out of memory at the cluster or warmup.

The best way to handle this is with an exponential backoff and retry with a ceiling.

See http://www.couchbase.com/docs/couchbase-sdk-java-1.1/java-sdk-bulk-load-and-backoff.html for a Java example.

此条目发表在Common分类目录,贴了标签。将固定链接加入收藏夹。