ScalingRectifiedFlowTransformersforHigh-ResolutionImageSynthesisPatrickEsserSumithKulalAndreasBlattmannRahimEntezariJonasMu¨llerHarrySainiYamLeviDominikLorenzAxelSauerFredericBoeselDustinPodellTimDockhornZionEnglishKyleLaceyAlexGoodwinYannikMarekRobinRombachStabilityAIFigure1.High-resolutionsamplesfromour8Brectifiedflowmodel,showcasingitscapabilitiesintypography,precisepromptfollowingandspatialreasoning,attentiontofinedetails,andhighimagequalityacrossawidevarietyofstyles.AbstractstratethesuperiorperformanceofthisapproachcomparedtoestablisheddiffusionformulationsDiffusionmodelscreatedatafromnoisebyinvert-forhigh-resolutiontext-to-imagesynthesis.Ad-ingtheforwardpathsofdatatowardsnoiseandditionally,wepresentanoveltransformer-basedhaveemergedasapowerfulgenerativemodelingarchitecturefortext-to-imagegenerationthatusestechniqueforhigh-dimensional,perceptualdataseparateweightsforthetwomodalitiesanden-suchasimagesandvideos.Rectifiedflowisare-ablesabidirectionalflowofinformationbetweencentgenerativemodelformulationthatconnectsimageandtexttokens,improvingtextcomprehen-dataandnoiseinastraightline.Despiteitsbettersion,typography,andhumanpreferenceratings.theoreticalpropertiesandconceptualsimplicity,itWedemonstratethatthisarchitecturefollowspre-isnotyetdecisivelyestablishedasstandardprac-dictablescalingtrendsandcorrelates...
发表评论取消回复