<zt>Dispelling Subversion FUD

yanboo 发表于 2007-11-22 18:44:48

<Dispelling Subversion FUD>
<驱散对Subversion的恐惧,不确定性和疑义>

Dispelling Subversion FUD


Ben Collins-Sussman


sussman@red-bean.com

Last Updated: 2004-12-21 14:00:06 CST

I'm a Subversion developer who has worked on the project from the very beginning. This is a private essay, written by me. I don't pretend to be objective at all; it represents my personal opinions and feelings about Subversion. It's not official project documentation, but my hope is that people will link to this document whenever they see FUD about Subversion. My goal is to dispel some of the more common rumors and misconceptions I've heard floating around the net.

Before I begin, a word of advice to curious administrators. If you're learning about Subversion and thinking of using it in your group or company, please approach it the way you'd approach any new product: with caution. This isn't to say that Subversion is unreliable... but that doesn't mean you shouldn't use some common sense either. Don't blindly jump into the deep end without a test-drive. No user wants a new product forced upon them, and if you're going to be responsible for administering the system, you better have some familiarity with it before rolling it out to everyone. Find a smallish project, and set it up as a "pilot" for Subversion. Ask for enthusiastic volunteers to test-drive the experiment. In the end, if Subversion turns out to be a good fit, you'll have much happier developers (who have been part of the process from the start) and you'll be ready to support a larger installation as well.

That said, here are the most common bits of FUD I've heard.

Subversion is too difficult to build, with too many dependencies. I hear that it requires Apache... talk about a showstopper!

Let's address the Apache issue first: Subversion does not require Apache. It depends on the Apache Portable Runtime (APR) library, but that's not the same as the Apache webserver. APR allows Subversion clients and servers to compile anywhere Apache does, in the same way that the Netscape Portable Runtime (NSPR) library makes Mozilla compile everywhere.

Subversion has two different servers: you can use Apache2 with a custom WebDAV module, or you can run a small standalone 'svnserve' server which is similar to CVS's pserver. Neither server is "more official", and both have trade-offs. See the beginning of chapter 6 in the Subversion book for a comparison of features.

Next, regarding the "difficult to build" problem: when was the last time you compiled CVS? Never? That's because it's preinstalled on just about every system, right? If you're using a well-supported operating system, Subversion binaries should be standard packages either built-in, supplied by your distribution (rpms, debs, fink, etc.), or easily downloadable (in the case of win32.)

Building is for developers, not users. Mozilla, Evolution, KDE, and Gnome all have an insane number of dependencies too, but most normal users don't know or care, because they're not compiling. The fact is, Subversion has a lot of dependencies because it has a lot of complex features, and doesn't reinvent the wheel. Nothing unusual about that.

Subversion doesn't break new ground -- it keeps same old lame CVS model. Why imitate CVS at all?

From the start, the Subversion project has always had a "fundamental axiom":

CVS is an excellent, proven model for version control; it just wasn't implemented as well as it could be.

We're not polishing a turd, we're polishing a diamond in the rough. Subversion takes the CVS model and adds directory versioning, atomic commits, database backend, versioned metadata, efficient binary handling, flexible network abilities, and a solid C API. Most of us think that it's what CVS should have been in the first place.

If you disagree with the fundamental axiom of the project, there really isn't much more to talk about; Subversion is not for you.

Some of the newer competing version control systems are "distributed" or "decentralized" -- projects like Monotone, or Arch, or even non-free systems like Bitkeeper. These products offer a somewhat radical new way of working, where each developer has a private repository, and repositories are able to exchange changes in any sort of hierachy.

A number of Subversion developers have mixed feelings about these distributed systems. On the one hand, it sounds really neat, and we're curious to try them out. On the other hand, we've heard a lot of people complain about how difficult they are to use, perhaps something that will improve over time. And at least one Subversion developer believes that the decenttralized model isn't right for free software development. You'll have to decide for yourself.

At the moment, there are no concrete plans to evolve Subversion into a decentralized system. But an interesting project called svk is a new decentralized system based on the Subversion libraries, and is supposedly "compatible" with ordinary Subversion repositories and regular users not using svk. A lot of people really love it, so you might want to check it out. The Subversion project, at a minimum, plans to study svk some day just to see how it implements various "smart merging" behaviors. Who knows? Maybe some decentralized abilities will creep into Subversion too. It's all speculation at this point.

If Subversion is only "CVS improved", why the heck did it take four years to get to 1.0? Geez, how hard can it be to slap some features on top of CVS?

Please, don't insult the project by claiming that we just "slapped some features on CVS." Those features aren't "slappable" on CVS. The CVS codebase is a bloody mess, and very difficult to extend. (Though at least two projects attempted to do so: CVSNT and MetaCVS.) That's why we started from scratch with a completely new design. Subversion and CVS share zero code; the only things they have in common are a concurrent, centralized model and similar UI.

We started out by implementing a journaled library that manages working copy data and understands versioned directories. Then we implemented a repository on top of a transactional database, one which stores snapshots of entire trees. It took about 14 months of coding before Subversion was complete enough to start hosting itself. After that, it's been two and a half years of continuous stabilization, bug-fixing, and regression tests, with releases every few weeks. Versioning directories is a hard problem.

When Subversion hit "alpha" it was already being used by dozens of private developers and shops for real work. Any other project probably would have called the product "1.0" at that point, but we deliberately decided to delay that label as long as possible. Because we're talking managing people's irreplaceable data, the project was extremely conservative about labeling something 1.0. We were aware that many people were waiting for that label before using Subversion, and had very specific expectations about the meaning of that label. So we stuck to that standard. All it takes is one high-profile case of data loss to destroy an SCM's reputation.

I'm researching different SCM solutions for my company, and I've seen tables that compare Subversion with other systems. I notice that Subversion is lacking [feature X]. Don't you think that's a problem? Are there plans to address this? My group might be willing to contribute resources to this project, but it definitely won't happen if we don't see this feature implemented.

First of all, threatening will get you nowhere. A lot of people think they can influence a project by offering resources, but then using that offer as means of "blackmailing" the project in a certain direction. Subversion, like any other open-source project, is a meritocracy based on code contribution and lots of discussion. You're welcome to participate like everyone else, but it has to be on the same terms and rules that everyone else follows. See the HACKING document for more detail.

Second, Subversion's developers are acutely aware of the Feature Creep problem. Many projects have loose goals and no solid definition of "done", so the project scope ever drifts and expands, the community shifts, and nothing is ever released. As testament, just look at the hundreds of dead projects on Sourceforge. From day one, our developers lay down a crisp definition of exactly which CVS problems Subversion 1.0 would fix and which ones we wouldn't. If you missed that discussion, I'm sorry. It's the front page of the website, and it's been our unchanging guide for years. If you want to influence the priorities of post-1.0 features, feel free to get involved in the project discussion and be prepared to write code. Make sure to look through our issue tracker and mailing lists for previous discussions about your favorite unimplemented feature. I can almost guarantee you're not the first person to ask about it.

Finally, a little rant about the several SCM "comparison tables" that I've seen out on the net. Honestly, I give very little credibility to these tables for a couple of reasons. Many of them are written by people who are core developers for a specific SCM system, and there's just no way such a person can write an objective comparison. Consciously or unconsciously, the whole discussion is framed in terms of methodologies and features most important to the author's own system. Other times, the authors are simply information-gatherers: the table reads like a book report. You get the impression that the author went around and read each project's self-description, neatly summarized it for us, but has little or no experience using the systems for real work in a group setting. Lastly, I have a personal objection to the assumption behind these tables. Various SCM features are listed as if there's some platonic, ideal system out there somewhere: "let's see how these systems stack up when compared to the perfect system!" That's a bunch of hooey. There is no perfect system. Every system has advantages and disadvantages, and each will be a better or worse fit for different groups. No chart is going to definitively tell you if a system is good for you. You need to try it for yourself.

Why wasn't Subversion written in a good modern language like Java or C++? Why did you use crufty old C?

This is dangerous ground -- nobody wants to get into a language holy war. There are few reasons we chose C. Paraphrasing a couple of our developers:

  • Portability. C++ compilers are not standardized to the degree that C compilers are. What works in one C++ compiler doesn't in another, and linking to C++ libraries can be a nightmare.

  • C has a large pool of skilled programmers.

  • C library APIs are accessible from almost every other language. This is not true of Java.

Portability is the main point here. Just because Subversion is written as a collection of C libraries doesn't mean you have to use C. There are Subversion library bindings for perl, python, Java, and C++ out there, all being used by third-party projects.

A database back-end is too dangerous and unfriendly. What if I need to hack on the data directly? With CVS, at least I can open the RCS files in my text editor.

Are you suggesting that people mucking directly in RCS files is safe? Let me turn the question around: why are you loading RCS files into your editor in the first place? Why are your administrators hand-moving files around in the CVS repository? In my experience, it's almost always to overcome some shortcoming or annoyance caused by CVS itself. A well-functioning system shouldn't need its repository "hacked".

When you want to share highly organized data over a network, what's the standard practice these days? Easy: put the data in a database (like MySQL) and make it available through a web interface. It's the classic LAMP solution.

Subversion is doing the same thing: putting your data in a database, and making it available over a network. Notice that nobody panics over storing critical data in MySQL, and MySQL data isn't exactly hackable in your editor. If you want to look at the low-level data, use database utilities to dump tables. If you want to migrate your data, dump it out into a portable, transportable format.

Also note that as of Subversion 1.1, you can create a repository that doesn't use BerkeleyDB at all. An "fsfs" repository stores data in the ordinary OS filesystem. (Though the files are still binary format, and still not meant to be human editable!)

My friend said that Subversion is dog-slow.

Yes, that used to be true. We spent a long time working on correctness rather than speed. In late 2003, though, we spent a significant amount of time working on performance optimizations. By our own testing, Subversion 1.x should be pretty close to CVS in speed.

Look, Subversion can't be all butterflies and rainbows. What problems should I expect when using it?

I'm not going to lie to you. There are some annoying things about Subversion, but in the interest of actually releasing something useful to the world before the heat death of the universe (to quote Karl Fogel), we had to let some imperfections slide:

  • A lot of error messages could be clearer. We're working on it.

  • It's easy to get charset conversion failures. The repository stores all paths and commit messages in UTF8, but clients can't always convert incoming UTF8 data to native system locale. We need to be more graceful about these sorts of failures, and get better at validating UTF8.

  • BerkeleyDB requires care and feeding. On the one hand, it's incredibly convenient to have a transactional database in a shared library, rather than forcing people to set up a full SQL system. But on the other hand, most folks are too reckless with the database. If the process accessing the repository (apache, svnserve, svnadmin, svn, whatever) doesn't have complete read-write permission on all the db files, or if the process is interrupted, then the database locks up and requires journaled recovery to get back into a consistent state. This is not a big deal when it happens, but it's almost always a result of someone being careless who doesn't yet know any better. "With great power comes great responsibility" -- but most people are unaware of this responsibility and get burned when they treat an SVN repository just like a CVS repository. Please read this part of the book so you can become an "educated user".

    Alternately, create an 'fsfs' repository instead of a BerkeleyDB one -- no wedging of the database, and works over NFS. See the Subversion 1.1 release notes



======================================================================

驱散对Subversion的恐惧,不确定性和疑义


本文是Ben Collins-Sussman对关于Subversion的一些非议写的一篇文章,其中的FUD意思是“恐惧,不确定性和疑义”。

作者:
Ben Collins-Sussman
sussman@red-bean.com >

翻译:
Rock Sun
daijun@gmail.com >


我是Subversion项目最初的开发者,这是我写的一篇非常个人化的文章,我不会装出丝毫的客观,只是代表了个人的观点和对Subversion的感觉,这不是一篇官方的项目文档,只是希望人们会在他们看到关于Subversion的FUD时,能够链接这篇文档。我的目标是驱散在网上流传的许多流言和误解。

在开始之前,需要给小心的管理员一点忠告。如果你正在学习Subversion并且想在团队或者公司里使用,请你像使用普通的新产品一样使用Subversion,这并不是说Subversion不可靠...只是意味着不应该使用任何常识。不要没有任何测试的盲目跳入,没有任何用户希望新产品强迫他们,如果你是负责系统管理工作,你必须在推广给别人之前对它足够熟悉。找一个小项目,作为Subversion的试验,寻求热情的志愿者来进行试验,最后,如果Subversion有好的结果,你会有许多高兴的开发者(从一开始参与这个过程),你也可以开始准备大的设置。

就这样了,下面是我最常听到的FUD。

 

Subversion非常难于编译,有太多的依赖,我听说它需要Apache...打断一下!

首先说一下Apache的问题:Subversion不需要Apache,它依赖Apache Portable Runtime (APR)库,并不等同于Apache的web服务器,APR允许Subversion客户端和服务器能够在Apache存在的任何地方编译,同一个道理,Netscape Portable Runtime (NSPR)让Mozilla可以在任何地方编译。

Subversion有两种不同的服务器:你可以使用包含WebDAV模块的Apache2,或者是使用类似于CVS pserver的独立运行的“svnserve”服务器,没有一种server会是“更官方的”,两种方式都存在一些代价,可以看看Subversion Book的第六章中的一些特性比较。

第二,关于“难于编译”的问题:你最后一次编译CVS是什么时候?从来没有?那是因为它安装在每一个系统,对吧?如果你使用支持良好的操作系统,Subversion二进制一定是你的发布版本(rpms, debs, fink等等)的预装标准包,或者非常容易下载(win32的情况)。

编译是开发者的事情,不是用户的。Mozilla, Evolution, KDE和Gnome也都有许多过度的依赖,但是大多数用户不知道也不关心,因为他们并不编译。事实上,Subversion有这么多依赖是因为它有很多复杂的特性,并没有重新发明轮子,没有不寻常的东西。

Subversion没有打破所有的根基 -- 它保存了CVS的模型,为什么都要模仿CVS?

从一开始,Subversion项目一直有一个“基本公理”:

CVS是版本控制一个已证明的完美模型;它只是没有能足够好的实现。

我们不是打磨垃圾,我们是在打磨粗糙的钻石,Subversion采纳了CVS模型,添加了版本控制,原子提交,数据库后端,版本化的元数据,有效地二进制处理,灵活的网络能力和坚实的C API,大多数特性是CVS应该首先具备的特性。

如果你不认可本项目的基础公理,就没有必要多说了;你不适合Subversion。


一些新的竞争的版本控制系统是“分布式的”或“非集中式的” -- 例如Monotone,或Arch,甚至非自由系统例如Bitkeeper,这些产品提供了工作的新方式,每一个开发者都有一个私有的版本库,版本库可以以任何等级的方式交换变更。

许多Subversion开发者对这些分布式系统有复杂的感觉,一方面,这听起来很整洁,我们很有兴趣尝试一下,另一方面,许多用户抱怨这些东西有多难使用,或许有些会随时间得到改善,但是至少一个Subversion开发者相信非集中式的模型对于自由软件开发是不恰当的。你要自己作出选择。


此刻,还没有将Subversion变成非集中系统的完整计划,但是有一个基于Subversion库的有趣的非集中项目svk,它应该可以与普通的Subversion版本库和不使用svk的普通用户是共存。许多人真的喜欢它,所以你可能会将其检出。Subversion项目,最起码,计划会有一天学习svk,只是去看一看如何实现各种各样“智能合并”,谁知道呢?也许一些非集中能力也会融入Subversion,一切在此刻都还只是构思。

如果Subversion只是“CVS的改进”,为什么花了四年才进入1.0?在CVS基础上增加一些特性就这么难吗?

请不要因为我们仅仅“扩展了CVS的一些特性”就侮辱我们的项目,这些特性不能通过扩展CVS实现,CVS的代码基佷混乱,非常难于扩展(尽管如此,容然有两个项目尝试如此:CVSNT和MetaCVS),这就是我们从头开始重新设计的原因,Subversion和CVS没有共同的代码;它们之间共同点只有并行,集中式的模型和相似的UI。

我们通过实现管理工作拷贝数据和理解版本化目录的日志库开始,然后我们在事务性数据库基础上实现了一个不版本库,每一个事务储存整个目录树,花了14月实现了使用Subversion保存自己的代码,之后,大约花了两年半时间持续的使之稳定,发现bug和回归测试,每几个周就发布一个新的版本,目录树的版本化是一件困难的事情。

当Subversion进入了“alpha”阶段,它已经经过了几十位开发者和工作室的实际工作的考验,此刻,任何其他项目都会叫这个产品“1.0”,但是我们决定尽可能延迟这个标签,因为我们是管理用户不可替代的数据,所以我们在标注1.0这件事上非常保守,我们已经意识到许多人在使用Subversion之前在等待这个标签,对这个标签的含义充满了期待,所以我们遵守这个标准,数据丢失会摧毁一个SCM的名誉。

我为我们的公司研究不同的SCM解决方案,我有一个Subversion与其他系统的比较表格,我认为Subversion缺少了[特性 X],你不认为这是一个问题?有计划解决这个问题吗?为什么会希望团队为这个项目贡献,但是最终不会看到这个特性的实现。

首先,威胁没有任何结果,许多人以为可以通过提供资源影响项目,但是如果使用这些资源意味着将项目引入错误的方向,Subversion像其他开源项目一样,是基于代码贡献和讨论的知识精华。欢迎你和其他人一起参与进来,但是必须遵守共同的术语和规则,更多细节请看HACKING文档。

第二,Subversion的开发者确实知道特性蔓延问题,许多项目有松散的目标,对所作的事情没有坚实的定义,所以项目的范围不断扩大,社区不断变迁,不会发布任何东西,就像许多Sourceforge上的遗嘱一样的项目。从第一天起,我们的开发者就发表了一份简要的定义,说明了CVS的问题,和Subversion1.0将要做的和不会做的事情,如果你错过了讨论,很抱歉,那是网站的首页,是我们多年不变的指导。如果你希望影响1.0之后的特性,可以自由的加入到项目的讨论中来,并准备好写代码,一定要浏览一下我们的问题追踪和以前讨论的邮件列表,我几乎可以保证你不会是第一个询问这些问题的人。

最后,我要讨伐许多在网上见到的SCM“比较表格”,真诚的讲,我有许多原因对这些东西保持不信任,许多编写者都是某个SCM系统的核心开发者,所有的讨论都是在作者自己系统的术语和方法论框架下进行的,另一方面,许多作者都是简单的信息收集者: 表格读起来像书籍报告,你的印象就是作者读了所有项目的描述,然后简单的总结出来,但是作者缺乏甚至没有在团队中实际使用系统的经验。最后,我对这些表格之后的假设有我的个人异议,许多SCM特性被列出来好像有一个理想的,柏拉图主义似的系统:”让我们看看这些系统累加起来与完美系统的比较!“真是一派胡言,没有完美的系统,每一种系统都有优点和缺点,每一种系统在特定项目都有其优势和劣势,没有一种图表会告诉你那一种系统适合你,你需要自己去尝试。

为什么Subversion没有使用好的现代语言如Java或C++编写?为什么使用古老的C?

这是一个危险的领域 -- 没有人希望卷入这场语言圣战,我们使用C有一些原因,下面是我们开发者的解释:

    * 可移植性:C++编译器没有C语言编译器级别的标准,一种C++编译器可以执行的代码不能工作在另一种下,而C++库的链接更是一场恶梦。    
    * C有大量熟练的程序员。
    * C库API可以几乎所有的其他语言访问,这对Java来说不是真实的。    
   
可移植是这里的重点,因为Subversion是由C库编写并不意味着你必须使用C,有许多Subversion库的绑定,例如perl, python, Java和C++,都被许多第三方项目使用。

数据库后端太危险也不友好,如果我希望修改数据结构?如果是CVS,至少我可以用编辑器打开RCS文件。

你是暗示人们直接搞坏RCS文件更加安全?让我们把问题反过来:你为什么要将RCS文件载入编辑器?为什么你的管理员手工移动CVS版本库的文件?以我的经验,这通常因为CVS本身的问题和缺陷,一个好的系统应该避免版本库的修改。


当你希望在网罗上分享高度组织的数据,现在标准的实践是什么?佷简单:把数据放到数据库(如MySQL),然后通过web界面展现,这是经典的LAMP方案。

Subversion作了同样的事情:将你的数据存放在数据库,使之可以在网络上可以访问,请注意没有人因为将关键数据存放在MySQL感到恐慌,而MySQL的数据不是能用编辑器直接修改的,如果你希望查看低层次的数据,必须使用导出工具,如果你希望移植数据,需要导出为一种可以移植的,透明的格式。

还需要提一下Subversion 1.1可以创建完全不使用BerkeleyDB的版本库,”fsfs“使用普通的OS文件保存数据。(依然是二进制格式,还是意味着不可以人工编辑!)

我的朋友说Subversion非常慢。

是的,曾经是这样的,我们花了大量时间来解决正确性而不是速度,在2003年后期,我们也花了大量时间进行性能优化,通过我们自己的测试,Subversion 1.x的速度与CVS相近。

看,Subversion不是butterflies和rainbows,如果出了问题我该怎么办?

我不想欺骗你,Subversion依然有一些问题,但是我们只是希望在宇宙毁灭之前作一些感兴趣的事情,我们必须容忍不完美:
    *许多错误信息应该被清理,我们正在努力。     
    *很容易得到字符转化失败,版本库使用UTF8保存所有的路径和提交信息,但是客户端可能一直不能转化UTF8为本地系统的设置,我们应该优雅的解决这种失败,在验证UTF8方面做的更好。    
    *BerkeleyDB需要关心和照顾,一方面,它非常简单的提供了事务性数据库,而不是强迫用户创建完整的SQL系统,但是另一方面,许多人对于数据库恣意挥霍,如果访问版本库(apache, svnserve, svnadmin, svn等等)的进程对于db文件没有完全的读写权限,数据库会锁住,并且需要日志恢复到一致的状态,这不是佷严重的问题,但这通常会使不够小心的人无所适从,”更大的力量带来更大的责任“ -- 大多数人没有意识到这种责任,并且得到了失败,因为他们像对待CVS版本库一样对待SVN版本库。请阅读本书的这一小节,你将成为”受过教育的用户“。

      作为选择,创建'fsfs'版本库而不是BerkeleyDB的 -- 没有数据库的问题,并且可以工作在NFS,见Subversion 1.1的发布说明。

网友评论
作者 gymdove 于 2007-07-11 17:39:52
翻译中有些缺字的现象,请注意一下
作者 jxmhbj 于 2007-05-15 17:17:05
不愧是Subversion项目最初的开发者,牛
不错!
作者 killms 于 2007-04-04 15:17:33
很客观,很冷静。确实是一个合格的开发人员应有的素质。
不错!
作者 killms 于 2007-04-04 15:17:31
很客观,很冷静。确实是一个合格的开发人员应有的素质。
作者 murphykwu 于 2007-01-30 23:02:11
很好,我会一直关注Subversion的。
作者 rocksun 于 2006-12-07 00:06:36
是不错啊,可以看出作者的水平很高
说的很好
作者 our420 于 2006-12-05 13:55:17
不愧是作者,整体架构掌握的很好
作者 rocksun 于 2006-11-08 14:16:46
我都觉得这一篇翻译的很出色。
good!
作者 yayama 于 2006-11-06 11:24:45
:) good!

关键词(Tag): subversion svn


收藏: QQ书签 del.icio.us 订阅: Google 抓虾

最新评论

发表评论

* 昵称

已经注册过? 请登录

新用户请先注册 以便能显示头像及追踪评论回复

Email
网址
* 评论
表情
 
 

分类小组论坛
杂谈, 娱乐、八卦, 文学、艺术, 体育, 旅游、同城, 象牙塔, 情感, 时尚、生活, 星座, 科技

请注意遵守中华人民共和国法律法规, 如威胁到本站生存, 将依法向有关部门报告, 同时本站的相关记录可能成为对您不利的证据.

相关法律法规
全国人大常委会关于维护互联网安全的决定
中华人民共和国计算机信息系统安全保护条例
中华人民共和国计算机信息网络国际联网管理暂行规定
计算机信息网络国际联网安全保护管理办法
计算机信息系统国际联网保密管理规定