数据包下载:http://www.rendoumi.com/soft/testdata.tgz

准备了三种数据,collections、products、users,要导入elasticsearch,压缩后247兆,先解压看看大小:

-rw-r--r--   1 root root  47M May 29 10:30 collections-anon.txt
-rw-r--r--   1 root root 522M May 29 10:33 products-anon.txt
-rw-r--r--   1 root root 857M May 29 10:36 users-anon.txt

users是用户表,products是产品表,collections是用户的收藏表。

把collections和products导入elasticsearch的时候都没有问题,瞬间就导入了,但是导入users的时候却不行,直接报错!!

# curl -s -XPOST http://localhost:9200/_bulk --data-binary @user-anon.txt
org.jboss.netty.handler.codec.frame.TooLongFrameException: HTTP content length exceeded 104857600 bytes.  

看起来是数据太大,超过了http post的限制了,改下配置:

vi config/elasticsearch.yml  
...
network.host: 172.16.11.2,127.0.0.1  
http.port: 9200  
http.max_content_length: 1024mb  
...

把content放大到1G,再提交,依然报错,java内存溢出了

[WARN ][http.netty               ] [Ogress] Caught exception while handling client http traffic, closing connection ...(省略)
java.lang.OutOfMemoryError: Java heap space  

我去,继续改,修改启动文件,把堆放大到4g

vi bin/elasticsearch  
...
# Maven will replace the project.name with elasticsearch below. If that
# hasn't been done, we assume that this is not a packaged version and the
# user has forgotten to run Maven to create a package.
ES_HEAP_SIZE=4g  
...

再启动,哈哈,数据跑的太多,这回改xshell崩溃了

但是elasticsearch日志没报错,显示触发了了java的GC

进程监控也显示java使用量比较大

没办法,继续改,打开压缩,索引也压缩存放

vi bin/elasticsearch  
...
network.host: 172.16.11.2,127.0.0.1  
http.port: 9200  
http.max_content_length: 1024mb  
http.compression: true  
index.store.compress.stored: true  
index.store.compress.tv: true  
...

提交数据的时候也先压缩,然后提交压缩包,并且不显示结果:

gzip users-anon.txt  
curl --compressed -H "Content-encoding: gzip" -XPOST localhost:9200/_bulk --data-binary @users-anon.txt.gz > /dev/null  

这回就完全ok了,用plugin的head看看,collections和products都是86772条记录,users是970446条记录。

在浏览器里打开网址,模拟*的查询

http://172.16.11.2:9200/_search?q=*  

乱入一团麻,不好看:

弄得好看点:

http://172.16.11.2:9200/_search?q=*&pretty=on  

这下也算是能看了:

comments powered by Disqus