Category: Algorithm & Life

Recently. I totally changed my development tools at home. I can not change that at the office since that will also be lots of .NET development and the project related to M$ OFFICE SDK.

Developers always interested to share or show off there development tools, because they want to improve there productivity.

You can see how developers care about there development machine and tools:


What is my development tools and software?

I use Ubuntu 11.04 as the main OS. There will be tons of work to do after you install the default Ubuntu.

I installed more fonts such as MSYH, Arial, then you will see the similar web page when you surfing the web.

Change the Ubuntu theme back to Ubuntu 10 style.

Soft wares installed under Ubuntu:

1. JDK (power lots of tools, support JAVA development)

2. Chromium (Main browser)

3. GIMP (Photoshop under Linux)

4. Dropbox (Sync tool for my documents)

5. Filezilla (FTP client)

6. Skype (Chatting tool)

7. IDE:

Aptana (Support PHP, Python, Ruby developement. But not used very frequently)

Eclipse (Java development)

Gvim (My main development tool)

8. Ubuntu tweak (configuration tool)

9. Chromium plugins:

Google reader notifier

Screen capture

Color pick

10. Vim plugins:

Vim wiki

Calander

ctags

Nerdtree

bufferexplorer

taglist

zencoding

11. Avant window navigator (Similar as Dock in Mac OS)

12. Mysql, PHP, Apache2, Nodejs, mongodb, git, svn, redis and lots of shell scripts to complete tasks.

经典算法书

1. CLRS 算法导论
2. Algorithms 算法概论
3. Algorithm Design 算法设计
4. SICP 计算机程序的构造和解释
5. Concrete Mathematics 具体数学
6. Introduction to The Design and Analysis of Algorithms 算法设计与分析基础
7. 编程之美–微软技术面试心得
8. Fundamentals of Algorithmics 算法基础
9. How to solve it 怎样解题
10. Programming interviews exposed 程序员面试攻略
11. Programming Pearls 编程珠玑
12. 算法艺术与信息学竞赛
13. An Introduction to Probability Theory and Its Applications
14. Numerical Analysis by Richard L. Burden,J. Douglas Faires
数值分析,讨论各种数值算法,比如插值、拟合、积分、微分方程的求解、线性和非线性方程组求解等。
15. TAOCP  http://www-cs-faculty.stanford.edu/~uno/taocp.html

N-Gram有很多应用,但是我们只用来做相似分析。基本思路来自Grzegorz Kondrak 2005年的一篇论文。http://webdocs.cs.ualberta.ca/~kondrak/papers/spire05.pdf

最近在做Translation memory的时候用到比较字符串相似度的算法。在机器翻译或者语言识别领域之所以能使用相似度算法其实是基于一种假设,相似的词具有相似的意义。

什么是N-Gram算法?
N-Gram 模型基于这样一种假设,第n个词的出现只与前面n-1个词相关,而与其它任何词都不相关,整句的概率就是各个词出现概率的乘积。在拼写检查里即是一个字母的出现概率只和前n-1个字母的出现概率相关,并且是前n-1个字母出现概率的乘积。
如何比较2个字符串的相似度?

一般情况我们会考虑用edit distance 或者LCS。前边的论文证实了这两种算法都是N-Gram的简化版本。

在搜索引擎里一般是用来做拼写检查或者提示,比如你在百度或者google输入一个词就会有相关的词提示出来。