vlambda博客
学习文章列表

HDFS卷(磁盘)选择策略

昨天,发了一篇文章,讲的是磁盘均衡的策略,浪尖是以增加大磁盘的目录数的方式,来提升大磁盘的写入概率。这其实只适合,磁盘大小不一导致的datanode节点数据写入磁盘生成的不均衡。对于有相同的磁盘大小,还造成了不均衡,比如小文件,不到一个block块大小的小文件太多,虽然,同一datanode的各个磁盘block数是一样,但是由于block大小不均,占用磁盘大小确实不一样,这种不均衡策略确实不实用。

在我们目前使用的Hadoop 2.x版本当中,HDFS在写入时有两种选择卷(磁盘)的策略,一是基于轮询的策略(RoundRobinVolumeChoosingPolicy),二是基于可用空间的策略(AvailableSpaceVolumeChoosingPolicy)。

基于轮询的策略

“轮询”是一个在操作系统理论中常见的概念,比如进程调度算法中的轮询算法。其思想就是从对象1遍历到对象n,然后再从1开始。HDFS中轮询策略的源码如下,非常好理解。





public class RoundRobinVolumeChoosingPolicy<V extends FsVolumeSpi> implements VolumeChoosingPolicy<V> { public static final Log LOG = LogFactory.getLog(RoundRobinVolumeChoosingPolicy.class);
private int curVolume = 0;
@Override public synchronized V chooseVolume(final List<V> volumes, long blockSize) throws IOException {
if(volumes.size() < 1) { throw new DiskOutOfSpaceException("No more available volumes"); }


if(curVolume >= volumes.size()) { curVolume = 0; }
int startVolume = curVolume; long maxAvailable = 0;
while (true) { final V volume = volumes.get(curVolume); curVolume = (curVolume + 1) % volumes.size(); long availableVolumeSize = volume.getAvailable(); if (availableVolumeSize > blockSize) { return volume; }
if (availableVolumeSize > maxAvailable) { maxAvailable = availableVolumeSize; }
if (curVolume == startVolume) { throw new DiskOutOfSpaceException("Out of space: " + "The volume with the most available space (=" + maxAvailable + " B) is less than the block size (=" + blockSize + " B)."); } } }}

基于轮询的策略可以保证每个卷的写入次数平衡,但无法保证写入数据量平衡。例如,在一次写过程中,在卷A上写入了1M的块,但在卷B上写入了128M的块,A与B之间的数据量就不平衡了。久而久之,不平衡的现象就会越发严重。

基于可用空间的策略

这个策略比轮询更加聪明一些。它根据一个可用空间的阈值,将卷分为可用空间多的卷和可用空间少的卷两类。然后,会根据一个比较高的概率选择可用空间多的卷。不管选择了哪一类,最终都会采用轮询策略来写入这一类卷。可用空间阈值和选择卷的概率都是可以通过参数设定的。


其源码如下。

public class AvailableSpaceVolumeChoosingPolicy<V extends FsVolumeSpi> implements VolumeChoosingPolicy<V>, Configurable {
private static final Log LOG = LogFactory.getLog(AvailableSpaceVolumeChoosingPolicy.class);
private final Random random;
private long balancedSpaceThreshold = DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_THRESHOLD_DEFAULT; private float balancedPreferencePercent = DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_PREFERENCE_FRACTION_DEFAULT;
AvailableSpaceVolumeChoosingPolicy(Random random) { this.random = random; }
public AvailableSpaceVolumeChoosingPolicy() { this(new Random()); }
@Override public synchronized void setConf(Configuration conf) { balancedSpaceThreshold = conf.getLong( DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_THRESHOLD_KEY, DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_THRESHOLD_DEFAULT); balancedPreferencePercent = conf.getFloat( DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_PREFERENCE_FRACTION_KEY, DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_PREFERENCE_FRACTION_DEFAULT);
LOG.info("Available space volume choosing policy initialized: " + DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_THRESHOLD_KEY + " = " + balancedSpaceThreshold + ", " + DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_PREFERENCE_FRACTION_KEY + " = " + balancedPreferencePercent);
if (balancedPreferencePercent > 1.0) { LOG.warn("The value of " + DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_PREFERENCE_FRACTION_KEY + " is greater than 1.0 but should be in the range 0.0 - 1.0"); }
if (balancedPreferencePercent < 0.5) { LOG.warn("The value of " + DFS_DATANODE_AVAILABLE_SPACE_VOLUME_CHOOSING_POLICY_BALANCED_SPACE_PREFERENCE_FRACTION_KEY + " is less than 0.5 so volumes with less available disk space will receive more block allocations"); } }
@Override public synchronized Configuration getConf() {
return null; }
private final VolumeChoosingPolicy<V> roundRobinPolicyBalanced = new RoundRobinVolumeChoosingPolicy<V>();
private final VolumeChoosingPolicy<V> roundRobinPolicyHighAvailable = new RoundRobinVolumeChoosingPolicy<V>();
private final VolumeChoosingPolicy<V> roundRobinPolicyLowAvailable = new RoundRobinVolumeChoosingPolicy<V>();
@Override public synchronized V chooseVolume(List<V> volumes, long replicaSize) throws IOException { if (volumes.size() < 1) { throw new DiskOutOfSpaceException("No more available volumes"); }
AvailableSpaceVolumeList volumesWithSpaces = new AvailableSpaceVolumeList(volumes);
if (volumesWithSpaces.areAllVolumesWithinFreeSpaceThreshold()) {

V volume = roundRobinPolicyBalanced.chooseVolume(volumes, replicaSize); if (LOG.isDebugEnabled()) { LOG.debug("All volumes are within the configured free space balance " + "threshold. Selecting " + volume + " for write of block size " + replicaSize); } return volume; } else { V volume = null;

long mostAvailableAmongLowVolumes = volumesWithSpaces .getMostAvailableSpaceAmongVolumesWithLowAvailableSpace();
List<V> highAvailableVolumes = extractVolumesFromPairs( volumesWithSpaces.getVolumesWithHighAvailableSpace()); List<V> lowAvailableVolumes = extractVolumesFromPairs( volumesWithSpaces.getVolumesWithLowAvailableSpace());
float preferencePercentScaler = (highAvailableVolumes.size() * balancedPreferencePercent) + (lowAvailableVolumes.size() * (1 - balancedPreferencePercent));
float scaledPreferencePercent = (highAvailableVolumes.size() * balancedPreferencePercent) / preferencePercentScaler;
if (mostAvailableAmongLowVolumes < replicaSize || random.nextFloat() < scaledPreferencePercent) { volume = roundRobinPolicyHighAvailable.chooseVolume( highAvailableVolumes, replicaSize); if (LOG.isDebugEnabled()) { LOG.debug("Volumes are imbalanced. Selecting " + volume + " from high available space volumes for write of block size " + replicaSize); } } else { volume = roundRobinPolicyLowAvailable.chooseVolume( lowAvailableVolumes, replicaSize); if (LOG.isDebugEnabled()) { LOG.debug("Volumes are imbalanced. Selecting " + volume + " from low available space volumes for write of block size " + replicaSize); } } return volume; } }}

这个策略可以在一定程度上削弱不平衡的现象,但仍然无法完全消除其影响。并且卷的可用空间只是诸多因素中的一个,仍然不够全面,磁盘I/O等指标也是比较重要的。但不管如何,它已经比纯轮询策略好得太多了。

修改卷选择策略

由hdfs-site.xml中的dfs.datanode.fsdataset.volume.choosing.policy属性来指定。可取的值为org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy或AvailableSpaceVolumeChoosingPolicy。
选择基于可用空间的策略,还有两个属性需要注意。

  • dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold
    默认值10737418240,即10G。它的含义是所有卷中最大可用空间与最小可用空间差值的阈值,如果小于这个阈值,则认为存储是平衡的,直接采用轮询来选择卷。

  • dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction
    默认值0.75。它的含义是数据块存储到可用空间多的卷上的概率,由此可见,这个值如果取0.5以下,对该策略而言是毫无意义的,一般就采用默认值。

关注:LittleMagic

链接:https://www.jianshu.com/p/d0c59d874dfd



推荐阅读: